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INCOMPLETE BLOCK DESIGNS WITH ROW BALANCE 
AND RECOVERY OF INTER-BLOCK INFORMATION 


W. Bo TAxvror 
Applied Mathematics Laboratory, D.S.I.R. New Zealand. 


INTRODUCTION 


The conditions for Balanced Incomplete Block Designs (Yates 
[1936]) are well known to statisticians involved in experiment design. 
They are 

(a) that no treatment should occur more than once within any block 

(b) that the number of blocks in which any specified pair of treat- 
ments should occur together should be a constant (A) for all possible 
treatment pairs 

(c) that all treatments should occur an equal number of times. 

Such an arrangement permits the elimination of systematic block 
differences from comparisons of treatment effects and enables all such 
treatment comparisons to be made with equal precision. A practical 
advantage of the design is that the number of treatments per block is 
less than the number of treatments to be tested, thus allowing a large 
number of treatments to be tested in relatively small blocks of units. 
In such designs the positions within the block are not considered as being 
heterogeneous in character and can normally be regarded as unimportant, 
the allocation of treatments to these positions being carried out by a 
random procedure. If we regard the design as applied to an agricultural 
trial with the blocks as strips of land running vertically as columns, the 
positions within the blocks can be regarded as forming the rows of the 
design. In many instances these rows may well contribute systematic 
effects in the observations, and it may be desirable to remove row 
variability as well as block variability in a manner similar to that 
employed in the Latin Square design. Youden [1937] was the first 
to employ such an arrangement in experimental work on tobacco 
plants where the blocks comprised the separate plants and the rows 
consisted of leaf positions on the plant, leaves in different positions 
possessing different susceptibilities. These designs, known as Youden 
Squares, consisted of Balanced Incomplete Block designs in which 
the number of blocks equalled the number of treatments, the treatments 
being arranged in such a way that every treatment occurred once in 
each position (row). Balanced Incomplete Block designs in general 
do not possess the above property and the simple arrangement found 
in the Youden Squares is not always possible. In such cases however 
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partial balancing of the rows along similar lines to block balance con- 
ditions above is possible, and the conditions of balance together with 
the form of analysis are given below. 


ROW BALANCE 


Considering a Balanced Incomplete Block design comprising b 
blocks of k positions testing v treatments replicated r times, conditions 
(a) and (b) above result in the following relationships 


vr = bk Av — 1) = rk — 1) b>v 
the last condition being due to Fisher [1940]. 


Case r = mk 


When the number of blocks is an integral multiple of the number of 
treatments, as occurs with these designs, Smith and Hartley [1948] 
have shown that it is always possible to rearrange the treatments 
within each block so that each row of the design will contain m and 
only m replicates of each treatment. Each Incomplete Block design 
satisfying such a condition may thus be arranged to form a generalised 
Youden Square design and analysed accordingly. The analysis of 
Youden Squares is well known, Cochran and Cox [1950] giving the 
analysis with and without the recovery of inter-block information; it 
will consequently not be considered here. 


Case r = mk + 7; m, 7, integers 


In this instance which covers all the remaining Incomplete Block 
designs, it is clearly impossible to specify that each treatment occur 
a constant number of times in each row as this would necessitate 
r/k = b/v being an integer. The only design of suitable size for practical 
application which involves a balanced arrangement with respect to 
rows similar to that employed for block balance is given by Pearce 
and Taylor [1948]. It has often proved useful in fruit tree experimenta- 
tion. 

We may utilise the concepts of partial balance initially defined by 
Bose and Nair [1938] and of group divisibility as defined by Bose and 
Shimamoto [1952] to produce the following conditions of balance:— 

(a) that each treatment occurs at least m times and at most (m + 1) 
times within each row. 

(b) that the v treatments fall into g equal groups of s treatments: 
any two treatments in the same group concur Aj times in the same row; 
any two treatments in different groups concur )4 times in the same row, 
concurrences being defined as the number of treatment pairs (1-2, say) 
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with both numbers in the same row, pairs being counted as different 
if they differ by at least one member. 

An example of such a row balanced incomplete block design is 
shown forvy = 6,r = 5,6 =10,k = 3,A=2,N = 9) Ag = ".6y gos 


Block 
Row 1 2 3 4 5 6 7 8 9 10 
1 1 2 4. 1 3 33 6 5 6 
2 2 il 3 6 1 4 5 2, 6 4 
3 5 6 it 3 4 2 2, 4 3 5 


There are three groups of treatments, namely (1, 6), (2, 4), (3, 5), and 
within row concurrences of 2-4 say total \{ = 9 while pairs such as 
2-5 occur together \3 = 8 times. Hartley ef al. [1953] have shown that 
all Balanced Incomplete Block designs as listed by Fisher and Yates 
[1953] fall into one of the above categories and are thus capable of 
being arranged to form generalised Youden Squares or designs partially 
balanced with respect to rows.* 


ANALYSIS OF THE DESIGN 
Caser = mk + 7 


We need the following properties of the design: From condition (a) 
of the row balance conditions it follows that there will be 7 rows in 
which there are m + | replications of any specified treatment. Within 
each of these + rows one treatment replicate may be singled out and 
termed an ‘odd’ replicate of the treatment as against the remaining 
even replicates. There will thus be 7 ‘odd’ replicates altogether of each 
of the v treatments and mk ‘even’ replicates. The total number of 
concurrences within rows of any two specified treatments (1-2 say) 
may be classified into three types 


(a) ‘even’ 1 — ‘even’ 2 having m*k concurrences, 
(b) ‘even’ 1 — ‘odd’ 2 having mr, 
‘odd’ 1 — ‘even’ 2 having mr, 
(c) ‘odd’ 1 — ‘odd’ 2 having), = J — 2mr — m’k if 1 and 2 
occur in same group, and 
ho = XE — 2mr — m’k if 1 and 2 
occur in different groups. 


*Plans are available at the Applied Mathematics Laboratory. 
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The theoretical model is the common one in such circumstances, 


Ci) = w+ pi + 8B; Up eee 


the z,;(.) being assumed normally distributed with zero mean and 
variance o”. Maximising the likelihood, which involves minimising 


Ye yy Ce ee i v1)” 


subject to the usual conditions that the sum of the p,; (row effects), 
8; (block effects) and », (treatment effects) are separately zero, results 
in the equations of estimation 


V,=ra+ry, — Sir; — S,(0;) ee ae) 
B; = ka + S,(v,) + kb; g = 1s 
R; = ba + S;(v,) + br; t= 172, 
G = bka. 


Roman letters being used as solutions of these equations give estimates 
of their Greek counterparts, and 


S,(r;) denotes the sum of the row indices over those rows containing 
the odd replicates of treatment v,, 


S,(v,) the sum over all treatment indices for which an odd replicate 
appears in the 7th row, 


S,(b;) the sum over those block indices of blocks containing treat- 
~ ment v, 


S;(v,) the sum over all treatments contained in the jth block. 


Introducing 
ee ea See Ay 
E, il V, k S,(B;) b S,(R,) ae bk Gg 


for each treatment, V, , B; , R; and G being respectively the totals for 
the treatment v, , the jth block, the 7th row and the grand total of all 


observations, solutions of the above equations give the treatment effect 
estimate 


ee ee 
ote bole mea e+ pot 8. | 


where c = r — (r — ))/k, there being g groups of s treatments per group, 
and S,(H,) denotes the sum of the /, corresponding to the treatments 
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contained with treatment ¢ in its group, including treatment L A= 
Ai — Ae. 

The analysis of variance follows the familiar pattern of fitting 
constants, firstly in the absence of treatment effects and then in their 
presence. The reduction in the sum of squares due to fitting treatment 
effects is then obtained and the ‘Between Treatments’ sum of squares 
becomes 


b : A 2 
(be — 7 +2.) bz: i Saray a | 


The Blocks and Rows components of variance follow the familiar 
pattern for two-way designs and the table of Analysis of Variance 
appears as in Table 1. The ratio of Treatment mean square to Error 
mean square provides a valid test of significance for the treatments, 
the ratio being distributed as F with v — 1 and bk —-b —-k —v+2 
degrees of freedom. 


TABLE 1 
Variation S.s. Dip Expected s.s. 
; kD) +227 b; 8:0) 
Between blocks ee B;—c.f. b—1 : : 
i 
1 
+522 Sio)+(b— Io? 
; b> ri +2 Dd) 7: Si(v,) 
Between rows > 5 Ri-c.f. k—1 : 
i 1 
cee Si@.) +(k—1)o” 
2 A 2 
c De, Vr =F. 9D, S,(v.) 
Between treat- (text above) et t g 
ments - On 1) o 
Error (by subtraction) |},——k—v+2| (bk—b—k—v+2)o” 
Total Perey = cok: bk—1 


c.f. denotes the correction factor G2/bk. 


All treatment comparisons are not estimated with equal precision. 
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Comparing two treatments in the same group 


2b 2 
Var. (vy, — v,) = (SaaS ous 
but for two treatments in different groups 
2b A \ ; 
UE SN Ge B14 ee eee aah « 


The analogy with the generalised Youden Square designs (b = mv) and 
Balanced Incomplete Block designs is at once apparent, for in the former 
7 = 0 = X, = A, , and in the latter the rows are disregarded. Both 
expressions above reduce to the correct value 20°/c for these designs 
(Yates [1936], Cochran and Cox [1950]). For such designs we have 


ee 5 SB, =Q, (Yates [1936)). 


Lack of complete row balance is illustrated by the two types of treat- 
ment comparisons, the variances of which tend to equality the closer 
the value of X{ is to Xj, i.e. the nearer complete row balance is achieved. 

Hoblyn, Pearce and Freeman [1954], when considering the design of 
experiments suitable for application to fruit plantations, obtained 
similar arrangements to those described above; these designs in their 
classification being Type O:7P with the added restriction that the 
P-element, i.e. the partially balanced one, shall be group divisible. 
This subdivision into groups of treatments partly overcomes the diffi- 
culty common to partially balanced designs, in that all treatments in 
the same group are compared with equal accuracy. 


ANALYSIS WITH RECOVERY OF INTER BLOCK INFORMATION 


From the equations of estimation given earlier, considering the sum 
of those block totals containing treatment ¢ we have 


S(B;) = ra + w | a) v, + 8,(b,). 
Comparisons of such totals for two different treatments thus involve 
comparisons of the treatment effects as well as various block and row 
effects. This information on the treatment effects however, cannot be 
utilised to improve the accuracy of the design, for both block and row 
effects were assumed to be systematic constants unknown in character 
and each subject to the zero sum condition. Assuming the block effects 
to be normally and independently distributed with zero mean and 
variance o; it is possible to utilise the inter block information and to 
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some extent increase the accuracy of the experiment. Minimising 


1 

ee > (Zi) — Bb — ps — a Se 2, Gescy) = kp — S;(,))” 
with respect to u, », , and p, results in the maximum likelihood esti- 
mates of the parameters. 

Caser = mk + 7 


Minimising the weighted sum of squares above we obtain the 
maximum likelihood estimates of v, as 


— b / 
# She ee fe = wate oe 


= / 
be — FR — 8d + for — nb OB) + ssycsry} 


where f = a /(o” + koi) and 


— 


rG 


7; SB) — 


Estimates of o and o; may be obtained in two ways. Firstly, it is 
noticed that the ‘Between Blocks’ component of the systematic analysis 
may be further subdivided thus 


Source of variation S.s. Die Expected s.s. 


rk(v—k) 2 
2 =i Oo; 
Treatment compt. eas (se) —"2) esi Ie 
DT athe ihe 
t 


Remainder (subtraction) b=0 (b—») (+ ko) 


1 
Total blocks p De Bi—c.f. b-1 


The above tabulation shows the expected values of the sum of squares 
assuming random block effects. It is seen that the remainder provides 
an unbiassed estimate of o” + koj. The ‘Error’ component in the 
systematic analysis (Table 1) provides an estimate of a”. 

For those designs in which m = 1 the above estimate of o° is based 
on few degrees of freedom; in such cases an alternative procedure is 


8 BIOMETRICS, MARCH 1957 


available giving an estimate based on a larger number of degrees of 
freedom. This consists of obtaining estimates not by substitution of 
the random model in the analysis of variance testing treatments assum- 
ing systematic block effects, but rather by substitution of the random 
model in the variance analysis testing block effects (assumed systematic). 
Regarding the 8; as systematic constants the likelihood ratio test of 
significance for block effects is obtained by the method of fitting con- 
stants for the mean, row, and treatment effects in the presence of 
block effects and in their absence. In this case rows and treatments 
are not orthogonal and the method for treatment of non-orthogonal 
data (Kendall [1948]) is used. The error sum of squares however will 
remain unchanged from Table 1 and the resultant analysis is shown in 
Table 2. 


TABLE 2 
Source of ID)sie S.s. 
variation 
1 2 
k » R;-— c.f. 
Rows and b 2 
treatments Cte tool > (E+E) 
+> —>— | D {8,481} 
br— 7th Ssh ee 
Blocks b—=1 by subtraction 
Error bk— b—v—k—2) from analysis Table 1 
Total bk—1 


Substituting random block effects for systematic effects in Table 2 
the expected value of the blocks component becomes 


br(v — k) { (vy — s)A 
k(b — 1)o, — ——“——~- (1 us) < 
( Nes pe ae ayy Lancs oma ea 
+ (6 — Do’. 
Equating the calculated value to the expected value of the Blocks 
and Error mean squares thus enables estimates of o” and o? to be otained. 
The variance of the difference of treatment effects also depends on 
2 
the values of o° and oj but may be approximated by substituting 
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estimates for these values. Assuming both o” and o? known we obtain 
for treatments in the same group 


mM 2b 5 
Pe erie he 


while for two treatments occurring in different groups 


os 2b 
be —7r+X, + for — d)/k 


Var. (v, — v,) 


Var. (v, aa 5) 


A 2 
{1 Se TS ea sate 
EXAMPLE 


An example illustrating the methods of recovering information from 
blocks in Incomplete Block designs with Row Balance is provided by 
the experiment below, by the Incomplete Block design with v = 6, 
fo— 2b = 15h = 1, = 13, 46 = 12 and g = 2. The results are 
those from an experiment conducted by Paul [1943] to compare the 
effects of the length of time in cold storage on the tenderness and flavour 
of beef roasts. Six storage perjods were considered, these corresponding 
to the six treatments of the Sivion while the blocks corresponded to 
cuts of meat from the same relative positions of the animal (e.g. shoul- 
ders). The figures obtained represent the total scores of four judges 
marking on a scale of 0-10, a high score indicating tender beef. Although 
the treatments, their subdivision into blocks, and the corresponding 
scores are those obtained from the conducted experiment the positions 
within the blocks have in some cases been rearranged to conform with 
the balance conditions of the rows. The results of the experiment 
appear in Table 3a, the blocks corresponding to the columns and the 
third and fourth rows being continuations of the first and second 
respectively. 


TABLE 3a 
Row Treatment (v) and score (2) 
1 v: fh Be BB ap DEE TE aH 


(eel aa) 27) 2330) 10) 262432) 345 40) Sl 2132, 


% | o 28 & @€ bb & & & Boe B Zit bo B # 
17 26 29 17 27 29 25 37 26 34 25 25 27 24 26 


Treatment groups: (1, 3, 5) and (2, 4, 6). 
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TABLE 3b 

v V S.(B;) bkEy vkE} bk(E. + E!) 
1 70 206 =071 —302 — 1726 

3 132 256 139 —2 134 

5 158 285 484 172 914 

2 115 241 —184 —~92 —414 

4 139 262 221 34 306 

6 155 288 311 190 786 


Checks are provided by the column totals of Table 3b which sum re- 
pee iee. to one and k times the totals of all observations in the case 
of cols. 2 and 3-(from left) and to zero for cols. 4, 5, and 6. 

Both types of analysis of variance with recovery of information 
have been carried out with this example, the first (Table 1) providing 
a valid test for treatments and involving the subdivision of the ‘Between 
Blocks’ component. The second analysis, providing estimates of o” 
and oj by the method illustrated in Table 2, gives no test for treatments. 

From the results of Analysis (1) in Table 4 we have, by equating 
the Blocks Remainder and Error mean squares to their expected values 


oa? + ko? = 52.59: oF =8.19; f = 0.1557. 


Equating the Blocks mean square of Analysis (2) to its expected 
value and using the Error mean square gives the equations 


2.881lo; + o = 37.49, o = 8.19 hence (= 02425. 


Estimates of treatment effects ignoring recovery of information are 


given by 
n=tle be #5 8.08) | 


Utilising the inter block information and the value of 0.156 for f gives 
the formula for treatment effects as 


2 15 eae Lee 

”% = 7067 {z, + 15627 +6) 67 S(H, + 1568}. 

This value of f is also used in the lower part of Table 4. In order to 
compare briefly the different possible arrangements the variance of 
treatment comparisons have been made for the Balanced Incomplete 
Block design both with and without the recovery of inter block in- 
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formation and with and without row balance; in this example the 
removal of row effects is only slightly advantageous, as is to be expected, 
no allowance being made for these in the original experiment. 


TABLE 4 
Analysis (1) Analysis (2) 

Source of variation | S.s. D.f. | M.s. |Source of variation) S.s. Dae |) IN ee 
Blocks: 1051.46} 14 Blocks 524 .57| 14 37.47 
Treatment compt. | 578.16) 5 
Remainder 473.30) 9 52.59 
Rows 12208 (FL Rows & 

treatments 1050.67) 6 
Treatments 511.70} 5 |102.34 
Error T3Lt3\-—-9 8.19| Error (3.13), 9D 8.19 
Total 1648.97} 29 =. iLotal 1648.97} 29 


Variance of Treatment Comparisons 


Treatment pair 

Balanced incomplete block design: Same group | Different group 
(a) without removal of Rows and recovery of 5.721 Seda 

information 
(b) removing Row effects only 5.461 5.591 
(c) recovery of information only 5.183 5.183 
(d) removing Row effects and recovery of 

information 4.948 5.054 


The application of such designs to the analysis of virus local lesion 
experiments is at present under consideration. It has been shown by 
Fry and Taylor [1954] in half leaf experiments on French Beans (Pha- 
seolus vulgaris) when comparing local lesions produced by virus prep- 
arations of different concentrations, that Incomplete Block designs 
using half leaves as the experimental unit are more efficient than treat- 
ments using a common standard. In this case the block (plant) possesses 
four units (half leaves) only, and positions within the block are desig- 
nated by leaf number and side (left or right). Comparisons $0 far 
carried out involving six treatments have shown a relative efficiency 
of greater than 125% when compared to the Incomplete Block design 
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in 16 out of 27 trials, whilst in five such trials efficiencies of less than 
100% were recorded. 

It is difficult to compare the relative efficiencies of the various types 
of Incomplete Block designs mentioned owing to two factors which are 
to a large extent intrinsic to the experiment; these are (a) the size of 
the between row variation and, (b) the magnitude of the f value which 
measures the ratio of within to between block variability. Ignoring 
the recovery of inter block information, the advantages of the removal 
of row variation by using row balanced designs agree closely with those 
enjoyed by the Latin Square when compared with the Randomised 
Block design. Utilisation of inter block information again increases 
the accuracy of the experiment at the expense of a valid test for treat- 
ment comparisons. 
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EXTENSION OF MULTIPLE RANGE TESTS TO GROUP 
CORRELATED ADJUSTED MEANS 


CriypE YounGc KRAMER 


Virginia Agricultural Experiment Station of the Virginia Polytechnic Institute, 
Blacksburg, Virginia, U.S.A. 


1. INTRODUCTION 


In recent years several writers have developed multiple range 
tests of differences among group (e.g. treatment) means that have 
equal variances and zero covariances. Duncan [1] describes various 
multiple range tests and points out their superiority over multiple 
t-tests when no relation among treatments is specified. Kramer [3] 
extended the multiple range test to group means with unequal numbers 
of replications. The extension to adjusted means that are correlated 
will be explained here in reference to Duncan’s ‘““‘New Multiple Range 
Test,” although it is applicable also to the tests of Keuls, Newman, 
and Tukey. 

In Duncan’s test, the difference between two means is significant 
if it exceeds a shortest significant range. The shortest significant 
range, R, , is obtained by multiplying the standard error of a mean, 
s, , by a value z,,,, of the studentized range, which Duncan has tabled 
for both the 5% and 1% levels. In Duncan’s notation, n, is the number 
of degrees of freedom of the error mean square and p = 1, 2, --- , kis 
the number of treatments. 

Consider an experiment with five treatments, A, B, C, D, and EH, 
each replicated r times. Suppose the ranked means are 


Yo Ys Yo Yu Yo- 
Then 7g — Jc must exceed Rs = 8,25,n, , to be judged significant. If 


it does, 7g — Ja must exceed Ry = 8,24,,, , to be judged significant, 
etc. A table of R, = s,2,,n, would be convenient for applying this test. 


2. GENERAL METHOD FOR ADJUSTED MEANS WITH 
HETEROGENEOUS VARIANCES AND COVARIANCES 


Suppose one has a set of k adjusted means with estimated variances 
and covariances, ¢;;s and c,;s. Since the standard error of each 
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mean may then be different, a modification of Duncan’s test is desirable. 
2 2 
If c,;s° = ¢,;s and c,;;s\ = 0, then 


oy = Seoul NV 2 (1) 


Now if we have only two means, 7; and g, , with estimated variances 
and covariances C18", C228 and ¢,.s°, and use the principle of (1) we get 


2 
G-=— > J& = ar Ae Coo)8 ie (2) 


if J, — J. is significant. If a ¢-test is used we get 
Th = Thy Sy (Crs renee Cs: bn (3) 


if J: — J 18 significant. By comparing Duncan’s tabled 22,,, with 
tn, , 1t is seen that 


tng a Z2.na/ \/ 2; (4) 


so that the right hand members of (2) and (8) are equal. 

Generalizing the above to k means, a conservative test to find differ- 
ences among adjusted means may be obtained by requiring that 9; — 9; 
exceed V(c;; — 2c,; + ¢;;)8°/2*2p.n, , to be judged significant. This 
may be written 


(G9: — G) V2/Cis — 2s; -F C;;) > Sep.ne > (5) 


so that a table of factors R, = sz,,,, would be convenient if ‘adjusted’ 
differences were obtained from the left member of (5). This test gives 
a rapid, useful approximation, but if the adjustments under the square 
root sign differ too greatly, there is a possibility that there will be a 
significant difference within a subset of means judged homogeneous by 
this test. If Vc,;; — 2c;; + ¢;; can be assumed constant without 


appreciable error for all 7; — 7; , then (5) becomes 
CF = 9:) EN $3,-3,/2 a) (6) 


and the possibility of a significant difference within a homogeneous 
subset is removed. 


3. METHOD FOR COVARIANCE ANALYSIS 


The precision of comparisons among means of different treatments, 
even in a statistically designed experiment, can often be increased by 
covariance analysis. However, if the adjusted treatment mean square 
is significant, there is the problem of which adjusted treatment means 
differ significantly. For simplicity of computation many texts have 
suggested that the approximations for the standard error of the differ- 
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ence between two adjusted means given by Finney [2] could be used for 
multiple f-tests. 

If y represents the variate to be estimated and x the measurement 
which is correlated with y, the adjusted mean of the 7th treatment is 


Yi: =. b(z; oa ft), (7) 


where 7; and Z; are the means for the 7th treatment, @ is the grand 
mean, and 0 is the error regression coefficient. The estimated variances 
and covariances of the adjusted means, each replicated r times, are 


1 cal 4 ib aa 
(pe E a zy : 6506 E -f- Hi ly ; 
r Eee r ie (8) 


where s” is the residual error mean square for y, and E,, is the error 
sum of squares for the z-analysis. Putting (8) in (5), and since 
2 LZ; = Z; : 
(i: — 2¢;; + ¢;;) = E a.m ar), (9) 
we require 


2rE 
iy. — 7). |——— a Ta 10 
(CF 7;) 2H. + r(Z; ee £,)" > SZ, a ( ) 
for 7; — 9} to be judged significant. 

Consider a randomized block experiment with r replications of 
five treatments, A, B, C, D, H. Suppose the ranked adjusted means are 
Yo 94 Yo Ya Oe. 

Applying (9), (9 — 92) V 2rH.,/[2E,, + 1(z — c)"] must exceed 
s'Z5.n, for 7f — Yé to be judged significant. If it does, (gf — 94) 
V 2rE,./(2E.2 + 1p — £,4)°] must exceed s’z,,,, for 73 — G4 to be 
judged significant, etc. A table of factors hf = s’z,,,, would be con- 
venient if “adjusted” differences were obtained from the left member 
of (10). 

The approximations for the variance of the difference between 
two adjusted means discussed by Finney simplify the computation 


greatly. If all the differences (¢; — Z;) are very small, one can use 
J (11) 
Sie = = 
Vv if 


realizing that this is a consistent underestimation of the standard 
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error of an adjusted mean. A table of factors Rj’ = s’2,.n,/ ~/r would 
first be obtained, then gf — gé would be judged significant if it exceeds 
Ri’. If it does, then 7f — 7{ must exceed R{’ to be judged significant, 
etc. 

A better approximation uses 


sl ibs 
a = [I T= DEL! aa 


where T,, is the treatment sum of squares for the z-variable. As 
pointed out by Finney, this removes the bias which affects equation (11). 
The needed table of factors is RJ’ = ~/s{1 + [T../(k — lEvel}/r- 
Zy.n,, and 7f — 7% above must exceed F{’’ to be judged significant, etc. 
In each of these approximations the difference 7; — 9; is not adjusted 
by the factor appearing in the left member of equation (10) because the 
standard error of an adjusted mean is taken to be the same for all means. 


4, METHOD FOR BALANCED INCOMPLETE BLOCK DESIGNS 


If we have v treatments each replicated r times in q blocks con- 
taining k different treatments each and let \ be the number of times 
every treatment appears in the same block with each other treatment, 
then the estimated variances and covariances of the means adjusted 
by the intrablock error are 


C8 = c,8 = ( —isirkh*, 4s == Nr ke (13) 
where s° is the intrablock error and EF = [r(k — 1) + AJ/rk. Now 
Ci — 2¢;; +¢;; = 2/rE = 2k/[r(k — 1) + Al, (14) 
therefore 7 — 7; must exceed 
ks? 
[r(k pa 1) + r] @p,na (15) 


to be judged significant. A table of factors R/ based on (15) would 
be convenient and again the differences between the means are not 
adjusted because the standard error of the difference of two adjusted 
means is the same for all means. 

If recovery of interblock information is used to adjust the means, 
y; — 9; must exceed 


DD k(v = i) 
fi," = rlv(k — 1)w, + (vy — k)ws] &pina (16) 


where w, = 1/MSE and w, = v(r — 1)/[k(q — 1)(MSB adj.) — 
(p — k)(MSE)] to be judged significant. 


Ry = 


MULTIPLE RANGE TESTS 17 


5. METHOD FOR LATTICES 
A. Simple Lattices 


If we have v = k treatments each replicated r times in blocks 
arranged in two groups, there is a different standard error of the differ- 
ence between two adjusted means accordingly as the two treatments 
occur or do not occur in the same block. Usually one uses an average 
standard error of all comparisons if a ¢-test is used to test differences. 
Using this average standard error 7; — g/ must exceed 


2 


ns s 4w = 
R; T r(k + 1) E ae w “is (k 1) Jenn ) (17) 


where s’ is the MSE, w = 1/MSE, and w' = 3/[4(MSB adj.) — (MSEB)], 
to be judged significant. 


B. Triple Lattices 


If we have v = k’ treatments each replicated r times in blocks 
arranged in three groups, there are again three different standard 
errors of the differences between two adjusted means. Using the 
average standard error 7; — 7}; must exceed 


—_ s 4w “s 
RZ = Vie ED Fea (k 2)| Doe (18) 


where s’ and w are defined above and w’ = 5/[6(/SB adj.) — (MSE)], 
to be judged significant. 


6. GENERAL METHOD FOR PARTIALLY BALANCED DESIGNS 


It is always possible to obtain an average standard error of the 
difference between two adjusted means which may be used for all 
comparisons, whether we have a simple, triple or near-balanced lattice, 
a two-associate-class design, etc. As indicated above, this implies that 
Ve; — 2c; + ¢;; is constant for alliandj. Then 7; — 7; must exceed 


Average standard error of all comparisons 
Bz = ep na 
V2 


to be judged significant. Ordinarily, this average standard error of 
all comparisons may be used without appreciable error. 


(19) 


7. METHOD FOR UNEQUAL NUMBERS OF REPLICATIONS 


The author’s method for unequal numbers of replications [3] is a spe- 
cial case of the general method with ¢;; = 0. Ife; = 1/n; and ¢;; = 
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1/n; where n; and n; are the number s of observations on treatments 
7 and j, then we require 


ar S 2nin; 
(Wi a qi) n; ae Nn; SZ Sfp no (20) 


for (g; — 9;) to be judged significant. 
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Introduction 


In a recent climatological investigation, the author had occasion 
to compute the regression of monthly rainfall on the altitude, latitude 
and longitude of certain observing stations in South Australia. The 
original data comprised the monthly rainfall at each of fifty stations 
for the years 1888-1949 inclusive, and regression formulae were calcu- 
lated for both mean monthly rainfall and monthly totals in individual 
years. 

For any particular month, the analysis of variance of the 3100 
(50 X 62) rainfall totals then assumed the form shown in Table 1. 
Designating the determining variates altitude, latitude and longitude 
by 2, , 2 and z; respectively, with corresponding means @, , Zz AMG clan, 
and regression coefficients b, , b, and b; , and writing 


ie 
j=1,2,--- ,50 


Dx = [v5 7 Lei 


for the matrix of deviations from means, 
bi. 
i ay 
bse 
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TABLE 1 


ANALYSIS OF VARIANCE 


Variation due to Degrees of 
freedom 
Between years 61 
average regression 3 
Between stations 49 
S 
average deviations 46 


from regression 
regressions X years 183 
Years X stations 2989 


deviations from regressions 2806 
within years 


Total 3099 


for the vector of regression coefficients in the sth year, and 


for the vector of average regression coefficients, the sum of squares for 
the 62 regression formulae, with 186 degrees of freedom is 


62 
DEACON. 
s=1 
and the sum of squares ascribable to the average regression is 
62b’(XX’)b. 
The difference between these two quantities 


62 oe = 
>) bi(XX’)b, — 62b’(XX’)b 


a=] 
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is the sum of squares for the interaction of regression coefficients with 
years. The relevant point here is that the corresponding mean square 
was strongly significant in all months, and thus for the purposes of the 
enquiry it became important to find assignable causes of this variability. 

With this objective, the regression of the b,,(s = 1, 2, --- , 62) on 
two further variates, mean monthly rainfall and time, was calculated; 
this was done also for the b., and b;, , thus introducing in all, 6 new 
regression coefficients. Obviously these coefficients are linear functions 
of the original rainfall observations, but it is not immediately clear how 
other points regarding them are resolved; for example, how is their 
component sum of squares calculated and what constitutes their vari- 
ance-covariance matrix? 

The procedure to be followed in problems of this type has been 
given in certain special cases; for example, Yates ([1935,] pp. 209, 210) 
outlined the calculation of the component sums of squares in the inter- 
action of two fertilizer treatments which had been varied quantitatively 
as an essential feature of a particular experimental design. The full 
implications of the statistical methods given in this and other cases are 
not, however, directly apparent, since the variates used were orthogonal 
both within and between sets, as in the example quoted, or the analyses 
were simple because there was only one variate in one or both sets. 


General case 


Denote the dependent variate by y, and the independent variates 


by 2; , 22, °°: ,%,. Assume there are n values of each, and that the 
observations are taken m times on the variate y. 
Let 
=k ye 
X = [x.; — 2] : ore i 
= 1,2, - , 
5 Keele 25 n 
Y = [Yu — Hi) 
| WR a mr (7 
and 
B = [b,.] Pe ie Ay 
ee a Pp A 


The m sets of normal equations corresponding to the m regressions of y 
on %, , Z2, °** , £» may then be written as 


(XX) Be= XY. (1) 


22 BIOMETRICS, MARCH 1957 


Let the second set of determining variates be 2; , 22, °*- , 25 


ETD Oe 
fp eu» rae z,], U Ih ? rq 
p= 1,2,--" 5m 
and the second set of regression coefficients be 
= 1, 2 ee, 
gata 5 
f= ee 


The normal equations involving the regression coefficients 6,, of the 
b,, on the variates 2, , 22, °** , 2, are then 


(ZZ’)@ = ZB’. (2) 
From (1) 
Bea xy xy? 
so that from (2) 
(ZZ) 6 =e ZY XXX 
or 
(ZZ’)@(XX’) = ZY'X’. (3) 


After carrying out the matrix multiplications indicated by (3), and 
equating corresponding elements of the matrix products, a set of pq 
equations is obtained, from which the pg elements in § may be calcu- 
lated. 


Taking p = 3, ¢ = 2, for example, and letting X;, and Z,, now stand 
for the general elements in (XX’) and (ZZ’') respectively, the left hand 
members of the equations (3) may be set out as the product 


Aut) mis AaoZii ea sZag eke Oi eee ee 
EWA EWE OCA PEA OGVAR CHA | ich, 
AGA OVA CVA EWA OC GVA |) labs 
IOWA GA GT OVA OO, Cais || as . 
XaZis XisZig Noein |XaaZag Nee aan 
3412 X1sZo2 Xeo3Z12 Xo3Zo2 X3sZi2 Xa3%o0_|| Bos 


The 6 X 6 matrix here given may also be presented more concisely in 
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the form 
DE CAAA. OF VAR EO. CR CHAR) 
Meal Ze Xenon SoA eZ)) |: 
Mas(ZZ’) Xas(ZZ’) Xae(ZZ) 


it is, in fact, a special type of product of (KX’) and (ZZ’) designated the 
Kronecker (or direct) product (see, for example, Murnaghan [1938], 
pp. 68-70) and is frequently written 


(XX’) X (ZZ’). 


Correspondingly, the coefficients of the y,, in the right hand members 
of (3) form a matrix which is the Kronecker product 


XX Z. 


The argument holds true for general values p and q, so that if the 
elements of $ are written out as a column vector, taking the columns of 
8 in order, and the elements of Y are also written as a column vector 
taking the rows of Y in order, and these vectors are designated 6, and 
Y, respectively, the equations (3) may be set out in the alternate form 


{(KX’) X (ZZ’)}B. = (K & Z)Y, (4) 
of which the solution is 
Bi = {(XX’) X (ZZ’)} "(KX ZY, 
{(XX")"" X (ZZ) "YK x Z)Y, 


using the property that the reciprocal matrix of a Kronecker product is 
the Kronecker product of the reciprocals. 

The set of normal equations (4) is thus what would have been 
derived, by the standard least squares method of fitting, from the original 
nm values of the y,; and the nm values of each of the pq variates 


I 


(5) 


I 


2) Gu ey) = PCa ye Eee = 25) 
Geis) Cue e0), (fp — Fo) Gao 122) (ey) Gee.) 
cans (a; aot E,) (1 a 21), Ge as E,) (22 sag 22), peo, @, = Paes i Zi) 


which may be regarded as having been generated by taking the Kro- 
necker product of the two sets 


(iE) la — 92a) os, Gy &) 
and 
(a = 21), (22 a 22), oe y ea oe Bq). 
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All subsequent operations thus reduce to standard procedure; for 
example, the sum of squares ascribable to the coefficients 6,, is 


Bi {(KX’) X (ZZ’)}B , (6) 
or in its alternate form 
G(X X Z)Y, , (7) 
and the variance-covariance matrix of the {,, 1s 
o° {(KK’) X (ZZ")}" = 0° {(KX")* X (ZZ’)"}, (8) 


where o’ is the residual variance of the yz: . 
If both members of (3) are transposed, the relation 


(XX’)@"(ZZ’) = XYZ’ (9) 


is obtained. This corresponds to a reversal in the order of fitting the 
variates x and z, and consequently, if the order of the elements in the 
vectors 6, and Y, is adjusted accordingly to give vectors 6, and Y, , 
the counterpart of (4) is 


{(ZZ’) X (KX’)}B. = (ZX X)Y. , (10) 


a set of normal equations identical with what would have been obtained 
by taking the regression of the y,; on the variates 


eisai) re Pr) (2 i) Bs) eer tee) eee ee)e 
(Cpe 2s) (Gi Ta) 52 =n) oe Ba), ay (eee nea) Open) 
resid (2, ay 2y)(u1 = Ti) (Za ae 2.) (a2 “a 2), Rises) (2, aa: Bae, ia a) 


Note also that the equations (10) can be derived from (3), and the 
equations (4) can be derived from (9), in accordance with the fact that 
the matrices (XX’) X (ZZ’) and (ZZ’) & (XX’) can be converted into 
each other by a suitable permutation of rows and columns, and similarly 
forex x<7Zand Zc xX. 

The final result is thus independent of the manner in which the 
calculations are made, but in general, either two-stage method will 
involve less computing than the standard procedure on pq variates. 


Example 


In the example of the Introduction, the units of measurement were 
inches for rainfall, 100’ for altitude, 10°* degree for both latitude and 
longitude. With these units and the variates of the first set in the 


order altitude, latitude, longitude, and those of the second set in the 
order mean rainfall and time, 


KRONECKER PRODUCT 


1138. 265050 


(KX’) = | —161.151320 
215.304630 
[ 0.00110607 
(0:0:O ee a 0.00006195 
L—0.00115655 
175.2654 
(ZZ’) = 
| —722 3850 
[0.00671215 
(ZZ’). = 
| 0.00024420 
13.1641 
ZB’ = 
— 63.1084 


so that 
6 = (ZZ’)-1ZB’ 
0.00671215 0.00024420 
r ae 0.00005925 
0.072948 0.022898 
% ee 0.000619 


From these we obtain 


—161.151320 215.304620 
534.296632 —125.495288 |, 

—125.495288 199. 183242 
0.00006195 —0.00115655 
0.00220017 0.00131925 }, 
0.00131925 0.00710186_| 

eee 

19855. 5000 


0. aie 
, 


0.00005925 
3.5660 —14. sd 
— 4.2488 30.5231 


| 13.1641 3.5660 —14.6490 
—63.1084 —4.2488 35.5231 
— 0.089651 | 
_0,001473 | 


| 
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The sum of squares ascribable to the regression coefficients in 6, is then 
Bi{(KX’) X (ZZ’)}6, = 950.06, 


and the analysis of variance for the month of June takes the form shown 
in Table 2. 


TABLE 2 
ANALYSIS OF VARIANCE OF JUNE REGRESSION COEFFICIENTS 
Variation due to Degrees of Sum of squares 
freedom 
Between years 61 8765.99 
Average regression 3 4135.11 
Average deviations from average 
regression 46 896.49 
Between stations 49 5031.60 
Regression on mean rainfall and 
time 6 950.06 
Remainder fp 415.58 
Regressions X years 183 1365.64 
Deviations from regressions within 
years 2806 1337.28 
Years X stations 2989 2702.92 
Total 3099 16500. 51 
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The logistic function dealt with here is 
ee at = TAN cee ep (1) 
Db, = In ©5/Q;) =a + 6a, , (2) 


where P; is the probability of an event at x;, L; is the logit of P;, a 
and 6 are parameters. It has a special relation to maximum likelihood 
estimation, stemming basically from the fact that the logistic function 
has simple sufficient statistics for the estimate of the parameters. 
These are >) n,p; = >, 7, and >) n,p.z; = >. r,a, for a and B respec- 
tively, where 7, is the total number ‘‘exposed”’ at 2,, r; is the observed 
number of events, p; = 1 — q; = 17,/n; . It is known that where 
sufficient statistics exist, the maximum likelihood estimates are func- 
tions of the sufficient statistics [8].* 

The estimating equations for the maximum likelihood estimate can 
be written 


I 


ae n(Di == i) 0 (3) 
SS Nx (Dp; a i) 0 (4) 


where #; = 1/[1 + exp {—(@ + 6x,)}] is the estimate of P; , obtained 
with the maximum likelihood estimates @, 8.** 


I 


*However, the maximum likelihood estimate is not necessarily, and is not in the present instance, 
a one-to-one function of the sufficient statistic and therefore the maximum likelihood estimate is 
itself not sufficient [7]. 

**It is of some interest to note that if p is a normally, instead of a binomially, distributed random 
variable, (3) and (4) are the estimating equations of a straight line with equal variance at all z;. The 
logistic function is thus seen to be the two-parameter function which has, in a certain sense, the simple 
relation to binomial variation that the straight line has to normal variation. 


28 
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Let & , Bo be provisional values for the estimates a, B; i= 0, + Bot; 
the corresponding value for 1; , the logit estimate; fp = 1 — q the 
corresponding provisional value for j; .* Then if for (Bo — p:) we use 
the approximation (jf) — #;) & (i, — l,)fogo given by the first term 
of a Taylor’s expansion, the estimating equations for da, 08, the cor- 
rections to the provisional estimates @, ; By are the solutions of 


Li np: — Linpo — 1@ DY nPodo — 0B inPodoe: =0 (5) 
De NiPit;, — > NiPot; — OB oD N:PoGoti — ap Sy N:PoGor: = 0 (6) 


These yield for the corrections 


5p = pew Spee — Dele Tam 
> wa, = (> w;«,)"/ >> w; 


_ Dp: — Di ho — a8 Do win (8) 
ew, 


where w; = Pogo , corresponding to 2, . 

The estimates obtained with these corrections are used as new 
provisional values in an iterative procedure which, when it converges 
to finite values, yields in the limit the maximum likelihood estimates.** 

It is to be observed, as noted by Anscombe [1], that the first terms 
of (5) and (6) are the sufficient statistics, functions of the observations 
only, and therefore are not calculated anew for each iteration. The 
values pj) and w; = fiogo will be obtained from J, ; it is therefore con- 
venient to have a table which, for argument / gives p and w = pq. 
Such a table is provided in Table 2.*** The calculation of the estimates 
for an example is exhibited in Table 1. 

The method outlined here is to be contrasted with that suggested 
by Fisher and Yates [5], and advanced by Finney [4]. That method 
is the analogue of the method set forth by Fisher and Bliss [2][3] for 
the maximum likelihood estimate of the cumulative normal distribution 
function. The present method, developed for the logistic function, is 
appreciably briefer and simpler in the calculations required. 


0a 


Comments on Example of Calculations 


The data used for the example with which to illustrate the method 
of computing the maximum likelihood estimate of the logistic function 


*The subscript 7 is omitted from fo , do , fo representing provisional values. ; 
**For some samples the maximum likelihood estimate does not yield finite estimates. 
#**Values in this table were calculated twice and then checked by inspecting first differences. 
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are taken from Fisher and Yates [5]; they used them to fit the integrated 
normal function. The initial provisional estimates for the present 
calculation were obtained by taking as 250 the same value as they 
used for the provisional estimate; for the slope Bo a logit line was used 
that fitted the points graphically about as did their provisional probit 
line. This gave & = —6.94, 8) = 1.04. Since the procedure here is 
iterative, in order to provide a definitive estimate, it is necessary to 
have decided on the precision with which the final estimates are to be 


TABLE 1 
EXAMPLE OF CALCULATIONS 
Data 
ay 3 4 5 6 a 8 9 
n 8 8 8 8 8 8 8 +4) p = 2.815 
Dead | 0 O 2 3 3 tf 8 *>> pr = 22.125 
p 0 O Qs seis BS oe UoOOo 


First iteration 


Equation for 7 obtained from provisional estimates, & and By (see text) is 


a 


l= 1.04¢ — 6.94. 4 
Values for # and w corresponding to values of / are read from Table 2. 


ay i p w 
3 —3.82 02146 .02100 > w = .90323 
4 —2.78 05841 05500 > wr = 5.90844 
5 —1.74 14931 12702 
6 — .70 33181 Oot > @ = 2.86251 
7 .84 58419 24291 > pe = 21.78174 
8 1.38 79899 16060 
9 2.42 91834 07499 > p-2 b=  .01249 
> we? = 40.48124 > px — >; px =.34326 
(> wz)?/>> w = 38.64980  wx(>> p — > 6)/DY w = .08170 
Diff. = 1.83144 Diff. = .26156 
06) =e 14282 Cres 
da = —.92042 a = —7.86 
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Third iteration 
7 = 1.20 x —8.00 


ar i Dp w 
3 —4.40 01213 01198 >, w= .80257 
4 —3.20 03917 .03764 > wr = 5.28108 
5 —2.00 .11920 10499 
6 — .80 31003 . 21391 > bP = 2.85392 
7 .40 59869 . 24026 > pr = 21.98036 
8 1.60 . 83202 .13976 
9 2.80 .94268 05403 “p-Dd p= _ .02108 
> wz? = 36.12938 > pr — Dd pr = 14464 
(> wz)?/d~ w = 34.75062 > wrx( p — > p)/D w = .18871 
Diff. = 1.37876 Diff. = .00593 
ap = .00430 B= 1.2043 
da = —.00203 & = —8.0020 


*Since n is equal at all values of x, n may be omitted; otherwise we should have De np and oy npx. 
**In applying the corrections dé and df to the provisional estimates for use with the next iteration, 
a working rule is to carry as many decimal places as correspond to 2 significant figures in the largest 


correction. 


stated. In problems of bio-assay such as the one dealt with here, I take 
it as satisfactory if the estimate of each of the parameters a, 6, and 
the 250 as finally stated is correct to three significant figures, and this is 
interpreted to mean that the third figure is correct within +1. The 
procedure employed yields, as the solution of each iterative cycle, 
the values of 0@ and 0, which are the corrections to be applied to the 
respective estimates; these are scrutinized and if the value of d@ or 
df is large enough to affect the third significant figure of the provisional 
estimate of any of the parameters, another iteration is required. In 
the present example the third significant figure corresponds to the 
second decimal place of each of the parameters. In the first iteration, 
as shown in the example, the corrections were, @ = —0.92, 06 = +0.14, 
and it is seen therefore that each correction affected the second decimal 
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figure. The second iteration, not shown in the example, yielded 
aa = —0.14, 08 = +0.02, and again both estimates were affected in 
the second decimal place. The third iteration is shown in the example 
and yielded d@ = —0.00203, 06 = +0.00430. Since each of these was 
less in absolute amount than 0.005, the second decimal and therefore 
the third significant figure of the estimates was not affected, and no 
further iteration is required. The estimates which are written from the 
results of the last iteration are @ = —8.00, B = 1.20-2.5 = 6.64. 

It is widely taught that, in situations like the present one in which 
a maximum likelihood estimate is obtainable only by using an iterative 
procedure, the estimate can satisfactorily be obtained by a single 
iteration based on some provisional value. Norton [6] has recently 
presented an illustration of how far from a maximum likelihood estimate 
an estimate can be if it is based only on a single iterative cycle. The 
difficulty is aggravated if, as in some current practices, the provisional 
value is obtained by a graphic fit made by eye. In this case it is doubtful 
even that the resulting number may be considered a proper statistical 
estimate, for it is not defined. One cannot, for instance, investigate 
whether it is or is not asymptotically efficient, since the result depends 
on who makes the initial graphic fit and how, and this is not a mathe- 
matical question. Two statisticians working with the same data may 
and generally will obtain different values for the estimate. The method 
set out here for the logistic function avoids these ambiguities. Examining 
the differences between estimates in successive iterative cycles seems 
the natural method to apply, and this is one, but only one, of several 
advantages of performing the calculations so as to yield the corrections 
to the provisional estimates, rather than the estimates themselves. 
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TABLE 2 
ANTILOGITS AND WEIGHTS 


Upper figure is p for specified value of logit 1; lower figure is corresponding w = pq. 
Tf lis negative, pis 1 minus the tabled value. Linear interpolation with 2 figures 
additional to the argument listed yields values correct to 5 decimal places. 


l 0 1 2 3 4 5 6 7 8 9 


0.0 |.50000 |.50250 | .50500 | .50750 | .51000 | .51250 | .51500 | .51749 | .51999 | .52248 
-25000 |.24999 | .24998 | .24994 | .24990 | .24984 | .24978 | .24969 | .24960 | .24949 
0.1 |.52498 |.52747 | .52996 | .53245 | .53494 | .53743 | .53991 | .54240 | .54488 | .54736 
-24938 |.24925 | .24910 | .24895 | .24878 | .24860 | .24841 | .24820 | .24799 | .24776 
0.2 |.54983 |.55231 | .55478 | .55725 | .55971 | 56218 | .56464 | .56709 | .56955 | .57200 
-24752 |.24726 | .24700 | .24672 | .24643 | .24613 | .24582 | .24550 | .24516 | .24482 
0.3 |.57444 |.57689 | .57932 | .58176 | .58419 | .58662 | .58904 | .59146 | .59387 | .59628 
.24446 |.24409 | .24371 | .24332 | .24291 | .24250 | .24207 | .24164 | .24119 | .24073 
0.4 |.59869 |.60109 | .60348 | .60587 | .60826 | .61064 | .61301 | .61538 | .61775 | .62011 
-24026 |.23978 | .23929 | .23879 | .23828 | .23776 | .23723 | .23669 | .23613 | .23557 


0.5 | .62246 |.62481 | .62715 | .62948 | .63181 | .63414 | .63645 | .63876 | .64107 | .64337 
- 23500 |.23442 | .23383 | .23323 | .23263 | .23201 | .23138 | .23075 | .23010 | .22945 
0.6 |.64566 |.64794 | .65022 | .65249 | .65475 | .65701 | .65926 | .66150 | .66374 | .66597 
.22878 |.22811 | .22743 | .22675 | .22605 | .22535 | .22464 | .22392 | .22319 | .22245 
0.7 |.66819 |.67040 | .67261 | .67481 | .67700 | .67918 | .68135 | .68352 | .68568 | .68783 
-22171 |.22096 | .22021 | .21944 | .21867 | .21789 | .21711 | .21632 | .21552 | .21472 
0.8 |.68997 |.69211 | .69424 | .69635 | .69847 | .70057 | .70266 | .70475 | .70682 | .70889 
.21391 |.21309 | .21227 | .21145 | .21061 | .20977 | .20893 | .20808 | .20723 | .20636 
0.9 |.71095 |.71300 | .71504 | .71708 | .71910 | .72112 | .72312 | .72512 | .72711 | .72909 
. 20550 |.20463 | .20376 | .20288 | .20200 | .20111 | .20022 | .19932 | .19842 | .19752 


1.0 |.73106 |.73302 | .73497 | .73692 | .73885 | .74077 | .74269 | .74460 | .74649 | .74838 
.19661 |.19570 | .19479 | .19387 | .19295 | .19203 | .19110 | .19017 | .18924 | .18831 
1.1 |.75026 |.75213 | .75399 | .75584 | .75768 | .75951 | .76133 | .76315 | .76495 | .76674 
.18737 |.18643 | .18549 | .18455 | .18360 | .18265 | .18171 | .18075 | .17980 | .17885 
1.2 |.76852 |.77030 | .77206 | .77382 | .77556 | .77730 | .77903 | .78074 | .78245 | .78415 
.17790 |.17694 | .17598 | .17502 | .17407 | .17310 | .17214 | .17119 | .17022 | .16926 
1.3 |.78583 |.78751 | .78918 | .79084 | .79249 | .79413 | .79576 | .79738 | .79899 | .80059 
.16830 |.16734 | .16637 | .16541 | .16445 | .16349 | .16253 | .16157 | .16060 | .15965 
1.4 |.80218 |.80377 | .80534 | .80690 | .80845 | .81000 | .81153 | .81306 | .81457 | .81608 
.15869'|.15772 | .15677 | .15581 | .15486 | .15390 | .15295 | .15199 | .15105 | .15009 


1.5 |.81757 |.81906 | .82054 | .82201 | .82346 | .82491 | .82635 | .82778 | .82920 | .83062 
.14915 |.14820 | .14725 | .14631 |] .14537 | .14443 | .14350 | .14256 | .14163 | .14069 
1.6 |.83202 |.83341 | .83480 | .83617] .83753 | .83889 | .84024 | .84158 | .84290 | .84422 
.13976 |.13884 | .13791 | .13699 | .13607 | .13515 | .13424 | .13332 | .13242 | .13151 
1.7 |.84553 |.84684 | .84813 | .84941 | .85069 | .85195 | .85321 | .85446 | .85570 | .85693 
13061 |.12970 | .12881 | .12791 | .12702 | .12613 | .12524 | .12436 | .12348 | .12260 
1.8 |.85815 |.85936 | .86057 | .86176 | .86295 | .86413 | .86530 | .86646 | .86761 | .86876 
12173 |.12086 | .11999 | .11913 | .11827 | .11741 | .11656 | .11571 | .11486 | .11402 
1.9 |.86989 |.87102 | .87214 | .87325 | .87435 | .87545 | .87653 | .87761 | .87868 | .87974 
11318 |.11234 | .11151 | .11068 | .10986 | .10904 | .10823 | .10741 | .10660 | .10580 


2.0 |.88080 |.88184 | .88288 | .88391 | .88493 | .88595 | .88695 | .88795 | .88894 | .88993 
10499 |.10420 | .10340 | .10261 | .10183 | .10104 | .10027 | .09949 | .09873 | .09795 
2.1 |.89090 |.89187 | .89283 | .89379 | .89473 | .89567 | .89660 | .89752 | .89844 -89935 
09720 |.09644 | .09568 | .09493 | .09419 | .09345 | .09271 | .09198 | .09125 | .09052 
2.2 |.90025 |.90114 | .90203 | .90291 | .90378 | .90465 | .90551 | .90636 | .90721 -90805 
08980 |.08909 | .08837 | .08766 | .08696 | .08626 | .08556 | .08487 | .08418 | .08350 
2.3 |.90888 |.90970 | .91052 | .91133 | .91214 | .91293 | .91373 | .91451 -91529 | .91606 
08282 |.08215 | .08147 | .08081 | .08014 | .07949 | .07883 | .07817 .07753 | .07689 
2.4 |.91683 |.91759 | .91834 | .91909 | .91983 | .92056 | .92129 | .92201 .92273 | .92344 
07625 |.07562 | .07499 | .07436 | .07374 | .07313 | .07251 | .07191 -07130 | .07070 


— tN 
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TABLE 2 (cont’d.) 


ANTILOGITS AND WHIGHTS 


Upper figure is p for specified value of logit 1; lower figure is corresponding w = pq. 
If 1 is negative, p is 1 minus the tabled value. Linear interpolation with 2 figures 
additional to the argument listed yields values correct to 5 decimal places. 


l 0 1 2 3 4 5 6 tf 8 9 


2.5 |.92414 |.92484 | .92553 | .92622 | .92690 | .92757 | .92824 | .92891 | .92956 | .93022 
.07011 |.06951 | .06892 | .06834 | .06776 | .06718 | .06661 | .06604 | .06548 | .06491 
2.6 |.93086 |.93150 | .93214 | .93277 | .93339 | .93401 | .93462 | .93523 | .93584 | .93643 
.06436 |.06381 | .06326 | .06271 | .06217 | .06164 | .06111 | .06057 | .06004 | .05953 
2.7 |.93703 |.93761 | .93820 | .93877 | .98935 | .93991 | .94048 | .94103 | .94159 | .94213 
.05900 |.05850 | .05798 | .05748 | .05697 | .05648 | .05598 | .05549 | .05500 | .05452 
2.8 |.94268 |.94321 | .94375 | .94428 | .94480 | .94532 | .94583 | .94634 | .94685 | .94735 
.05403 |.05356 | .05309 | .05262 | .05215 | .05169 | .05124 | .05078 | .05033 | .04988 
2.9 |.94785 |.94834 | .94883 | .94931 | .94979 | .95026 | .95073 | .95120 | .95166 | .95212 
.04943 |.04899 | .04855 | .04812 | .04769 | .04727 | .04684 | .04642 | .04600 | .04559 


3.0 |.95257 |.95302'| .95347 | .95391 | .95435 | .95478 | .95521 | .95564 | .95606 | .95648 
.04518 |.04477 | .04436 | .04897 | .04357 | .04318 | .04278 | .04239 | .04201 | .04163 
3.1 |.95689 |.95730 | .95771 | .95811 | .95851 | .95891 | .95930 | .95969 | .96007 | .96046 
.04125 |.04088 | .04050 | .04014 | .03977 | .03940 | .03904 | .03869 | .03834 | .03798 
3.2 |.96083 |.96121 | .96158 | .96195 | .96231 | .96267 | .96303 | .96339 | .96374 | .96408 
.03764 |.03729 | .03694 | .03660 | .03627 | .03594 | .03560 | .03527 | .03495 | .03463 
3.3 |.96443 |.96477 | .96511 | .96544 | .96578 | .96610 | .96643 | .96675 | .96707 | .96739 
.03430 |.03399 | .03367 | .03337 | .03305 | .03275 | .03244 | .03214 | .03185 | .03155 
3.4 |.96770 |.96802 | .96832 | .96863 | .96893 | .96923 | .96953 | .96982 | .97011 | .97040 
.03126 |.03096 | .03068 | .03039 | .03010 | .02982 | .02954 | .02927 | .02900 | .02872 


3.5 |.97069 |.97097 | .97125 | .97153 | .97180 | .97208 | .972385 | .97262 | .97288 | .97314 
.02845 |.02819 | .02792 | .02766 | .02740 | .02714 | .02689 | .02663 | .02638 | .02614 
3.6 |.97340 |.97366 | .97392 | .97417 | .97442 | .97467 | .97491 | .97516 | .97540 | .97564 
-02589 |.02565 | .02540 | .02516 | .02493 | .02469 | .02446 | .02422 | .02399 | .02377 
3.7 |.97587 |.97611 | .97634 | .97657 | .97680 | .97702 | .97725 | .97747 | .97769 | .97790 
-02355 | .02332 | .02310 | .02288 | .02266 | .02245 | .02223 | .02202 | .02181 | .02161 
3.8 |.97812 |.97833 | .97854 | .97875 | .97896 | .97916 | .97937 | .97957 | .97977 | .97996 
-02140 |.02120 | .02100 | .02080 | .02060 | .02041 | .02020 | .02001 | .01982 | .01964 
3.9 |.98016 |.98035 | .98054 | .98073 | .98092 | .98111 | .98129 | .98148 | .98166 | .98184 
-01945 |.01926 | .01908 | .01890 | .01872 | .01853 | .01836 | .01818 | .01800 | .01783 


4.0 |.98201 |.98219 | .98236 | .98254 | .98271 | .98288 | .98304 | .98321 | .98337 | .98354 
-01767 |.01749 | .01733 | .01716 | .01699 | .01683 | .01667 | .01651 | .01635 | .01619 
4.1 |.98370 |.98386 | .98402 | .98417 | .98433 | .98448 | .98463 | .98478 | .98493 | .98508 
-01603 |.01588 | .01572 | .01558 | .01542 | .01528 | .01513 | .01499 | .01484 | .01470 
4.2 |.98523 |.98537 | .98551 | .98566 | .98580 | .98594 | .98607 | .98621 | .98635 | .98648 
.01455 |.01442 | .01428 | .01413 | .01400 | .01386 | .01374 | .01360 | .01346 | .01334 
4.3 |.98661 |.48674 | .98687 | .98700 | .98713 | .98726 | .98738 | .98751 | .98763 | .98775 
.01321 |.01308 | .01296 | .01283 | .01270 | .01258 | .01246 | .01233 | .01222 | .01210 
4.4 |.98787 |.98799 | .98811 | .98823 | .98834 | .98846 | .98857 | .98868 | .98879 | .98890 
-01198 |.01187 | .01175 | .01163 | .01152 | .01141 | .01130 | .01119 | .01108 | .01098 


4.5 |.98901 |.98912 | .98923 | .98933 | .98944 | .98954 | .98965 | .98975 | .98985 | .98995 
01087 |.01076 | .01065 | .01056 | .01045 | .01034 | .01024 | .01014 | .01005 | .00995 
4.6 |.99005 |.99015 | .99024 | .99034 | .99043 | .99053 | .99062 | .99071 | .99081 | .99090 
-00985 |.00975 | .00966 | .00957 | .00947 | .00938 | .00929 | .00920 | .00911 | .00902 
4.7 |.99099 |.99108 | .99116 | .99125 | .99134 | .99142 | .99151 | .99159 | .99167 | .99176 
00893 |.00884 | .00876 | .00867 | .00859 | .00851 | .00842 | .00834 | .00826 -00817 
4.8 |.99184 |.99192 | .99200 | .99208 | .99215 | .99223 | .99231 | .99239 | .99246 | .99253 

-00809 |.00801 | .00794 | .00786 | .00779 | .00771 | .00763 | .00755 | .00748 | .00741 
4.9 |.99261 |.99268 | .99275 | .99283 | .99290 | .99297 | .99304 | .99310 | .99317 -99324 

-00734 |.00727 | .00720 | .00712 | .00705 | .00698 | .00691 | .00685 | .00678 | .00671 


a a |e | | a | 


BIOASSAY FROM A PARABOLA 
Ci Banss 


The Connecticut Agricultural Experiment Station and Yale University, New Haven, 
Connecticut, U.S.A. 


Most bioassays are based upon parallel, straight log-dose response 
lines fitted to the data for the Standard preparation and for the sample 
or Unknown. The distance between these two lines on the log-dose 
axis measures the log-relative potency of the Unknown. Curves that 
are initially non-linear can often be rectified by transforming the 
response to a suitable metameter and restricting the dosage range. 
The statistical analysis of such linear bioassays is now well developed 
(Bliss [1952]; Finney [1952]). 

The requirement of linearity has sometimes been a problem in 
the microbial assay of the vitamins. In a recent study (Bliss [1956a]) 
of vitamin 5,, assays, either the percent transmittance or its logarithm 
could usually be plotted as a straight line against the log-dose, at least 
within a range of from 1.5 to 4 ml. of test solution per tube. The linear 
relation often covered a wider range of doses, but occasionally there 
was significant curvature even within the narrower range. When 
culture tubes are prepared, sterilized, inoculated, incubated and read 
in a systematic design, as is commonly the practice, an apparent curva- 
ture may be due to their arrangement. Such positional effects are well 
established (Bliss [1956a]; Brownlee and Lapedes [1951]; Campbell 
ef al. [1953]). But even with randomization, some curvature may persist. 

In several cases of curved regressions, the plotted points for the 
Standard and the several Unknowns could be fitted with parallel 
parabolas. Even though the quadratic term was highly significant, 
the variance attributable to the linear term was usually more than 
100 times larger. Curvature of this magnitude has seemed to have 
little effect upon the assayed potency and its confidence interval, when 
these are computed as if the relation were linear (Finney [1952)}). 
Accordingly, quadratic curvature that accounts for less than 1 percent 
of the variation attributable to the linear component has been con- 
sidered admissible in the assays in U.S.P. XV. How good is this rule? 
Should the analysis be based instead upon a series of parallel parabolas? 
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These problems will be considered in relation to the four-dose microbial 
assay of vitamin B,,. For the two illustrative assays, I am indebted 
to Parke, Davis and Company and to Food Research Laboratories, Inc. 


Initial analysis of variance. Two numerical examples have been selected 
that had non-linear log-dose response relations in terms of the original 
turbidimetric response, but with curvature in opposite directions. In 
each laboratory, four test preparations (Unknowns 1 to 4, and A to D) 
were assayed against the same Standard by the basic turbidimetric 
procedure in US.P. XV. ‘Test solutions of each Unknown, roughly 
equipotent with that of the Standard, were added to culture tubes in 
triplicate at dosages of 1.5, 2,3 and 4 ml. Each tube was then brought 
to a constant volume of 10 ml. with 5 ml. of basal medium and water. 
In Laboratory A, the three tubes for each dose of each preparation 
were placed in three different tube racks in a randomized block design. 
In Laboratory B, the replicate tubes were not separated and dosage 
levels and preparations were arranged systematically. ‘This common 
but far less satisfactory design is based upon the dubious premise that 
rack position has no effect upon the response. Without endorsing this 
assumption and for illustrative purposes only, we will proceed as if it 
were true. 

The completed racks of tubes were then sterilized, inoculated, 
incubated overnight and the percent transmittance determined photo- 
metrically. The turbidimetric response for each tube, in terms of 
y = (100 — % transmittance), has been totalled over the f = 3 repli- 
cates for each treatment; these totals, 7, = >> y, are given in Table 1 
for the randomized assay (Assay A), and in Table 2 for the systematic 
assay (Assay B). The mean responses for the Standard and two Un- 
knowns from each assay have been plotted in Figure 1 against the 
log-dose, coded as x} = —29, —12, 12 and 29. Both series of points 
curve systematically but in opposite directions, as if the generally 
smaller responses in Assay A were approaching a lower asymptote and 
the much larger responses in Assay B an upper asymptote. 

The easiest non-linear curve to compute statistically is the parabola, 
Y =a’ + b,x + b,x”. Just as the log-doses x have here been replaced 
by the linear coefficients x, , each x” can be replaced by the orthogonal 
polynomial values r, = 1, —1, —1 and 1 respectively (Bliss [1956c]). 
The expected response Y at each observed dose of a given preparation 
can then be computed from the mean response (g) and the polynomial 
coefficients, B, = Ts/f >| a] and B, = T¢/f >) x2, as 


Y= 9+ Biz, + Box, (1) 
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TABLE 1 

Assay A 
Total Tesponse for each treatment, 7, = >> y, where y = (100 — % trans- 
mittance) in each of three tubes, from a vitamin By assay of four test preparations 


or Unknowns (U; to Us) against the Standard (S), arranged in three randomized 
sets, each in a different tube rack. 


Dose, Value of Response 7 from 3 tubes for preparation Total 
ml/tube | 2 2 S (Gh Us U. ue Ta 
LD. —29 1 50 55.5 55 47 52 209.5 
2 —12) —I 61.5 66.5 66.5 61.5 64 320 
3 12 —-1 84 91 91 85 94 445 
4 29 1 105 inGieeesy ally 103 108.5 | 542 
a “3 f ide BOUL m sol yes 2OeD 296.5 318.5 |1566.5 = 7 
Ti = om aT: 1865 =1831 2092 1906 1998.5 9692.5 = T, 
T, = Lal 9.5 6.5 14.5 35 25 19365 = 7, 
ge sg ay ig 21 29 =4 18 
Fory = T:, 90 — Gs 5.25 7.25 —1.00 4.50 
y2, — 798.00 1263.50 —134.50 761.25 
TABLE 2 
Assay B 
Total response, 7; = >. y, in triplicate tubes (not randomized) from a vitamin 
By assay of four Unknowns (U4, to Up) against the Standard (8); in the same units 
as Table 1. 
Dose Value of Response 7’, from 3 tubes for preparation Total 
ml/tube | 2 2: S Us Up Ue Up Ta 
1.5 —29 1 126) §=6126 117 122 123 614 
2 S17 1 143. «154 136 144 147 724. 
3 2 ail lives ileAsy 167 172 172 859 
4 29 1 186 190 179 187 187 929 
Trae T's 628 645 599 625 629 3126 = T 
=> mT: 2100 2108 2170 2221 2156 10755 = 7, 
This 3) als —4 —13 af hi —9 —40 = T, 
A= ia, le —29 —3 1 
Fory = T:,9u — 9s 4.25 —7.25 —.75 25 
y?, YY 1373.00 —2183.75 —169.25 110.25 
u— Ys 
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FIGURE 1 


Log-Dosr RESPONSE CURVES FoR y = (100 — % TRANSMITTANCE) IN THE VITAMIN 

By Assays A AND B In TaBieEs 1 AND 2, Frrrep wiTH PARABOLAS PARALLEL IN y; 

x, Is THE CopED LOG-ML. or TEST SOLUTION PER CuLTURE TUBE. NoTE FROM THE 
Broxen Horizontat Lines THAT THE PARABOLAS Arp Not PARALLELIN 2) . 


where Tf = >) x,T, and T’ = >> vT,. If in terms of y the log-dose 
response curves for the h’ = 5 preparations were parallel within the 
sampling error, the same coefficients B, and B, would apply to all 
preparations. They would then be computed from the totals 7, = >> T/ 
and T, = >> T/,, summed over all h’ preparations, as B, = Twa Soe 
and B, = T.,/h’f >> «2. The parallel curves in Figure 1 have been so 
determined, with Y = g + 0.3280x, + 0.60832, for Assay A, and as 
Y = 9 + 0.36402, — 0.66672, for Assay B, each with its own g from 
12 y. 
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The assumption of parallelism has been tested by the analyses 
of variance summarized in Table 3, computed from the data in Tables 


TABLE 3 
Results of analysis of variance in units of y? of the data for Assays A and B in Tables 
1 and 2. 
Assay A Assay B 
Row Source D.f. | Mean square F Mean square F 
1 |Between preparations f 16.72 9.47) 22.98 16.61 
2 |Assay slope, B? i) Buby ay 1801. 3914.38 2834. 
3 |Quadratic curvature, Q? 1 22.20 12.58) 26.67 19.31 
4 |Cubic curvature 1 1.88 .62 
5 |Preparations X slope 4 1.91 41 
6 |Preparations X quadratic] 4 1.98 .92 
7 |Preparations X cubic $ 1.38 3.00 
8 |Error (Rows 4-7), s? 13 1.765 1.00 1.381 1.00 
9 |Replicate tubes ¢? 38 ,39 .818 46 . 564 Al 


1 and 2 and expressed in units of a single tube. Although the variation 
was somewhat erratic in the systematic Assay B, it is clear from the 
mean squares in rows 4 to 7 that five parabolas, parallel in y, would 
fit the data for each assay satisfactorily, the preparations differing 
only in their vertical positions (row 1). In each assay, both the linear 
(B,) and the quadratic (B,) coefficients were highly significant, although 
B, accounted for far more of the total variation in y than B,. Hach 
error (row 8) has been computed from the variation of the means about 
the fitted parabolas. This exceeded the variance é* between replicates 
(deducting the rack effect in Assay A) by more than two-fold (row 9). 
The discrepancy between s” and é is commonly much larger in system- 


atic designs than in Assay B (Bliss [1956a]). 


In standard bioassay procedure, the significant quadratic curvature 
in both assays would call for a transformation of the response. Lacking 
a single, overall function, the shape of the curves in Figure 1 suggests 
the transformation z = log y for Assay A, and z = 2 — log (100 — y) 
for Assay B. The response in each individual tube was transformed 
accordingly, and analyses of variance were computed from their respec- 
tive z (Table 4). In each case, the transformation eliminated the 
initially-significant quadratic curvature, so that both could be fitted 
by parallel straight lines with a slope of B, . For the preparations in 
Figure 1, the mean responses z have been replotted in Figure 2 against 
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FIGURE 2 
Log-Dosr Response CuRVES ror z = LoG y (Assay A) AND z = 2 — Loa (100 — y) 
(Assay B), Fitrep wiTH PARALLEL STRAIGNT LINEs. 


the coded log-dose x, and Assay A fitted with the equations Z = 2 + 
0.005599x, and Assay B with Z = Z + 0.003314, . Although the 
original curvature could here be handled effectively by a simple trans- 
formation, cases may arise where the data cannot be rectified so easily. 


Log-relative potencies and half-confidence intervals. The availability of 
the linear transform z provides a criterion against which we may test 
various alternatives. In terms of y, B” for the assay slope in both 
examples was more than 140 times larger than Q’, the quadratic curvature 
for the assay (Table 3). If this curvature, though significant, were 
ignored, and each log-potency were computed from y with the equation 
for a linear relation, how large would the discrepancy be? 
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In a factorial assay with h’ — 1 Unknowns and parallel, straight 
log-dose response curves, the log-relative potency M’ of each Unknown 
can be computed from the terms defined in Tables 1 and 2 as 


M’ = kcih’T,/T, (2) 


where 7’, = Ty — Tg and, with the present dosage sequence, ci = 7.2332 
(Bliss [1956c]). 

Solving Equation (2) for each assay, first with the numerical values 
from y in Tables 1 and 2 and then with the corresponding terms from 
the transform z, gave the first two rows of log-relative potencies in 
Table 5. In comparison with M’ based upon the linear transform 2, 
the most discrepant linear estimate of M’ from y differed by less than 
one percent, as determined from the antilogarithm of the difference, 
and for the eight Unknowns the difference averaged only 0.38 percent. 
With the quadratic term 7’, initially positive, the M from the orginal 
y averaged larger than those from the linear transform z; when 7, was 
initially negative, the M from y were smaller. 

For a similar comparison of confidence intervals, 3 may be com- 
puted from linear log-dose response curves (Bliss [1956b,c]) as 


1 = VC — DCM” + Beth) (3) 
where the slope factor C = B’/(B* — s*f’), B’ and s’ are taken from 


corresponding rows in Tables 3 and 4, ¢ denotes Student’s ¢ at P = 0.05, 


TABLE 4 


Results of analysis of variance in units of 10°z?, where z = log y for Assay A (Table 1) 
and z = 2 — log (100 — y) for Assay B (Table 2). 


Assay A Assay B 
Row Source D.f. | Mean square F Mean square F 
1 |Between preparations 4 5013 11.39 1918 17.81 
2 |Assay slope, B? 1 | 926 262 2105. 324 518 3013. 
3 |Quadratic curvature, Q? 1 120 27 224 2.08 
4 |Cubic curvature 1 739 102 
5 |Preparations < slope 4 503 49 
6 |Preparations X quadratic] 4 432 50 
7 |Preparations X cubic 4 310 226 
8 |Error (Rows 4-7), s? 13 440.0 1.00 107.7 1.00 


and c’z2 = 0.10623. Note that for this comparison, the error variance 


42 BIOMETRICS, MARCH 1957 


in computing C depends upon the variation about the parallel curves, 
whether these are straight lines or parabolas. In consequence, it does 
not include Q’. 


TABLE 5 


Comparison of log-relative potencies M’ and their half-confidence intervals $1 
when computed from parallel straight lines (Equations 2 and 3) with the response y, 
ignoring curvature, and with its linear log-transform z, and from parallel parabolas 
(Equations 8 and 9) with the response y. Hach mean difference (ignoring sign) and 
mean bias (considering sign) is measured from the estimates determined from z. 


Statistic) Response} Eqn. Estimate for unknown Mean Mean 
Assay A| variate U, U2 U3 U, | difference bias 
M’ y 2 .03918  .05410 —.00746 .03358) .00266 .00148 
2 2 .04164  .05104 —.01071 .03170 


8 .04050 .04948 —.00841 .03163) .00127 .00012 


1 y 3 .02635 .02642 .02628 .02633) .00200 .00200 
Zz 3 .024386 .02440 .02429 .02433 
y 9 .02393 .02406 .02386 .02391) .00040 |—.00040 
Statistic] Response} Eqn. Estimate for unknown Mean Mean 
Assay B | variate U4 Uz Uc Up | difference bias 
M’ y 2 .02858 —.04876 —.00504 .00168) .00059 |—.00039 
2 2 .02955 —.048388 —.00443 .00129 


8 .03015 — .04667 —.00312 .00266) .00125 .00125 


nie 
oe 
ow 


y .02097 .02103 .02093 .02093) .00064 .00064 
Z 3 .02032 .02038 .02029 .02029 
9 .02169 .02174 .02165 .02164| .00136 .00136 


The results in Table 5 for 3 again show only small discrepancies 
between the two methods of linear estimation, z2 = log y giving consist- 
ently the shorter half-confidence interval. Comparing the log-relative 
potencies from y and from z respectively as percentages of the half- 
confidence intervals for z, the discrepancy in M’ averaged 6.9 percent 
and varied from 2 to 13 percent. 


Bioassay from a parabola. If our assay is based directly upon a para- 
bola, we encounter a major difficulty in parallel-line assays where the 
log-dose response relation is not effectively a straight line. 

As the independent variate, the log-dose x is assumed to be free of 
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experimental error. All of the sampling variation is presumably in the 
dependent variate, the response y. In consequence, there is only one 
regression line, that of y as a function of the log-dose x. If the several 
preparations can be fitted by straight lines that are parallel in y, these 
lines are also parallel in x. The log-relative potency M’ is then the 
distance in units of x between the line for the Standard and that for 
each Unknown; it is the same at all levels of y. 

When the relation between the response and the log-dose is non- 
linear, however, curves that are parallel on the y axis are not parallel 
in terms of x. The distance in x between two parabolas that are parallel 
in units of y, as in Figure 1, increases (or decreases) continuously as 
the response increases. This discrepancy vanishes when the trial 
potency of an Unknown, assumed in preparing its test solution, is 
confirmed exactly by the assay. Its log-relative potency M’ is then 
precisely zero, and the observations for the Standard and for this 
Unknown can be fitted by a single curve, here a parabola. If x is the 
log-dose of the Standard, this parabolic curve has the equation 


Yg=at bet b3x” (4) 


The responses from any other Unknown, for which M’ # 0, can be 
plotted on the same curve by adding M’ to each assumed log-dose, or 


Yy = a+ bie + M’) + bo@ + My (5) 


If a series of such curves, with a separate M’ for each Unknown, were 
computed so as to minimize the sum of squares of all the y-deviations, 
the resulting values of M’ would be our required log-relative potencies. 

Especially in an assay with several Unknowns, this method of 
computing M’ could be very heavy. For an iterative solution, we 
might start with provisional M’ computed from the y’ by Equation (2). 
Thereafter, coding would no longer be practicable. Doses would be 
expressed directly in terms of log-ml. of test solution, and Equations 
(4) and (5) fitted by customary regression techniques. With the best 
estimates of M’, the reduced mean square for preparations should 
vanish in an analysis of covariance with x and x” as the covariates. 
The adequacy of the M’ in Table 5, that were computed from the y’ 
with Equation (2), has been tested in this way with 3-place log-doses. 
From the analysis of covariance summarized in Table 6, the reduced 
mean squares for error differ but little from the corresponding values 
in Table 3, but those for preparations are now less than one hundredth 
as large. Each observed mean response has been plotted against its 
adjusted log-dose in Figure 3 and the 20 observations in each assay 
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TABLE 6 


Reduced mean squares and F ratios from the analysis of covariance of y? about the 
parabolas plotted in Fig. 3. 


Row Source Df Assay A Assay B 
Mean square F Mean square F 
1 |Between preparations 4 004. . 002 .014 .009 
2 |Linear component, 6, 1 3179.09 2002. 3914.33 2543. 
3 /Quadratic component, b, | 1 24.58 15.48 24.66 16.02 
4 |Error 13 1.588 1.00 1.539 1.00 
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fitted with a single parabola. Both tests confirm the relatively good 
approximation of the M’ which were computed from the y’ on the 
assumption of a linear relation. 


Fitting parabolas that are parallel in x. Instead of adjusting the several 
M’ until the mean square between preparations is exactly zero, a more 
practicable expedient computationally is to determine each M’ directly. 
As suggested by the analysis of variance for a discriminant function 
(Cochran and Bliss [1948]), let us consider x (or z,) as a dummy variate 
and reverse the direction of fitting. The bias which would be expected 
to result from this reversal seems to be negligible, 7f the M’ differ no 
more from zero than those in our two examples and 7f the scatter about 
the curves is of a similar magnitude. When we compute parallel 
parabolas with x (or z,) as the “‘dependent”’ variate and y as the ‘“‘inde- 
pendent” variate, the resulting curves will not be parallel in terms of y. 
Since the variation in xz has been minimized, the distribution of the 
individual observations about the fitted lines has no clearly defined 
form. However, the calculation is much easier than meeting the 
conditions of Equations (4) and (5), although it is still not practicable 
for routine assays and has the theoretical limitation of reversing the 
known order of causation. 

With the exact values of y varying unpredictably, the following 
sums of squares and products are computed for each preparation: 
[y"], fy], [y‘], [yx] and [y*z,], where each term within square brackets 
is measured from its mean. The corresponding terms are then summed 
over all preparations, leading to two simultaneous equations 


Do Wyler + 2 y*lbs = 2 [yar] 6) 
> loi + DO [yt = DO ya] 


Solving for bj and b; and remembering that z; = 0, we have from % 
and y* for each preparation and any selected response Y the parallel 
curves 


I 


X, = bY — 7) +u(Y — yx) (7) 

Note that 7’ is the mean of the observed y’, not the square of the mean 9. 
Parabolic log-dose response curves, parallel in terms of x, , have 
been calculated from the assays in Tables 1 and 2. For convenience, 
each 7’, has here been designated as “‘y’”. In each assay, the sums of 
squares and products were computed separately for each preparation 
and their sums substituted in Equations 6 to obtain the assay slopes 
by Equation (7) of bf = 1.87700 and b: = —0.005 437 1 for Assay A, 
and b; = —0.35650 and b, = 0.004 0940 for Assay B. Equation (7) 
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was then solved with the observed g and y’ for each preparation at 
selected responses Y to obtain the parallel parabolas plotted in Figure 4. 
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By analogy with similar analyses of variance for the discriminant 
function, and in view of its “robust’’ character statistically, the ade- 
quacy of the fitted parabolas has been tested in Table 7, with 2, as 
the dependent or dummy variate, in a form resembling that in Tables 
3 and 4. Since >) x, = 0 for each of the five preparations, there was 
no mean square between preparations. Each total sum of squares 
was computed as 5 >) 2; = 9850, with h’(k — 1) = 15 degrees of freedom. 
For the variation attributable to the assay slope, we have B” = 
> [yxl/>> [y’]. This was subtracted from the total variation attri- 
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TABLE 7 


Results of analysis of variance of Assays A and B in units of the dummy variate 
xi , Where x is the coded log-dose. 


Row Source Df Assay A Assay B 
Mean square F Mean square F 
1 |Assay slope, B3 1 9712.09 2148. 9738.99 2618. 
2 |Assay quadratic, Q’? 1 79.13 17.50 62.65 16.84 
3 |Preparations X slope 4 7.20 3.27 
4 |Preparations X quadratic| 4 .66 3.40 
5 |Remainder 5 5.46 4.33 
6 |Error (Rows 3-5), s? 13 4.521 1.00 3.720 1.00 


butable to the parallel parabolas to obtain the assay curvature in row 2, 
Q? = bf >> [yxi} + bf >> [y’x,] — B”?. The variation among prepara- 
tions in slope (row 3) was computed from straight lines fitted individually 
to each preparation with the reversed coordinates, and that for quadratic 
curvature (row 4) from parabolas fitted separately for each preparation. 
When compared with the residual variation about the five separate 
parabolas (row 5), there was no more reason to question the essential 
parallelism of the fitted curves in units of x, (or x) than there had been 
previously in units of y. Both the assay slope and the assay curvature 
were unquestionably significant in comparison with the assay error. 


The log-relative potency from parabolic log-dose response curves. If 
Equation (3) is solved with 7; and ye for the Standard at a selected 
response Y to obtain X,s5 , and the equivalent value X,, is determined 
at the same Y for a given Unknown, the difference between them will 
be the log-relative potency in units of zx, . This can be obtained by a 
single step and the log-relative potency computed in units of w as 
M’ = i(X1s — Xi) = 1(biGu — Gs) + bilys — ys)}. = (8) 
For both Assays A and B, the interval in logarithms corresponding to 
a unit interval in x, is 7 = 0.007 343 2. The differences in the means 
in the last two rows of Tables 1 and 2 and the slopes bj and bj lead 
directly to the third series of M’ in Table 5. 
The confidence interval for each log-relative potency computed 
from parallel parabolas with x as the dummy “dependent” variate is 


40 a ist V 9 /te+6::(Go—Gs)? + 21 Gu -Fs(ys— Ys) +en(yo—ys)” (9) 


where k is the number of dosage levels, s° is the error variance in terms 
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of x, , and fis read from a table of Student’s ¢ at P = .05 with the degrees 
of freedom in s*. Since x is treated here as a dependent variable, zero 
or near-zero differences in 7 and in y” cannot reduce the confidence limit 
below some finite value related to the number of observed responses for 
each preparation. By analogy with the confidence intervals for standard 
bioassays, this “base” is provided by the first term under the radical. 
The remaining three terms are the coefficients from an inverse matrix, 
here equal to 


Assay A Assay B 
ti= >, ly‘]/D= 0.009 779 385 0.025 507 57 
C2=— >, [y*]/D=—0.000 060 127 12 —0.000 082 473 04 (10) 


C2= >, ly'|/D= 0.000 000 373 632 6 0.000 000 267 541 3 


where D = >> [y’] >> [y‘] — ><? [y*]. Substitution of these values in 
Equation (9), with ¢ = 0.007 3432, s from Table 7, ¢ = 2.160 and 
k = 4, gives us the half-confidence intervals $2 for each Unknown in 
Table 5. 

These parabolic estimates of M’ and of LZ are compared in Table 5 
with the values computed previously from the linear transform 2. 
The largest discrepancy in M’ was 0.53 percent and for the eight Un- 
knowns the difference averaged 0.29 percent, which was less than that 
between the M’ computed from y and from z with Equation (2). The 
half-confidence intervals for the parabolic estimates differed but little 
from those based upon the linear transform z, being somewhat smaller 
in Assay A and larger in Assay B. As percentages of the half-confidence 
intervals for z, the discrepancy between the parabolic M’ (Equation 
(8)) and those from z averaged 5.7 percent with a range from 0.3 to 
9.5 percent. Thus the parabolic estimates agreed somewhat better 
with those based on the linear transform than did those computed 
from y ignoring the curvature. 

The routine calculation of log-relative potencies from parabolic 
curves is obviously impracticable. Fortunately, in the great majority 
of vitamin B,. and probably of other microbial assays, either y or z 
plots linearly against the log-dose. Changing the overall concentration 
of the test solutions may obviate the need for “logging” the response. 
By distributing replicate tubes in different racks and at random, curva- 
ture due to positional effects can be eliminated. Where quadratic 
curvature still persists, but Q’ is less than one percent of the variation 
attributable to the assay slope (B’), it should have little effect upon 
factorial estimates of potency and their confidence intervals. These 
estimates, computed as if the lines were straight, should approach in 
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reliability those determined after first transforming each response to 
a linear metameter. They should be more reliable than graphic esti- 
mates from a single Standard curve, which ignore the information on 
slope from the other preparations in the assay. Judging from this 
study, the factorial analysis of log-relative potencies and their confidence 
intervals would classify as “robust” statistical procedures. 


Summary. Some bioassays in which the response plots against the 
log-dose x with a slight but significant curvature can be fitted by 
parallel parabolas. This occurred in two microbial assays for vitamin 
B,, with the response y = (100 — % transmittance), the curvature 
being positive in one assay and negative in the other. In the present 
case, a logarithmic response metameter z rectified each curve and served 
as a check on analyses in terms of y. 

Bioassay from a parabola encounters the major difficulty that 
log-dose response curves that are parallel in y are not parallel in z. 
If each log-dose « of an Unknown were corrected by adding to z its 
log-relative potency M’, the responses of the Standard and of all Un- 
knowns could be plotted against their respective corrected log-doses 
as a single parabola. The corrections which would minimize the sum 
of the squared deviations in y about this single curve are the required 
log-relative potencies. A more practicable, though theoretically less 
valid, technique is to reverse the direction of fitting and compute 
parabolas with xz as a dummy or ‘‘dependent”’ variate and with y as 
the ‘‘independent”’ variate. An analysis of variance of two numerical 
examples showed that each series of five preparations could be fitted 
equally well by parabolas that were parallel in terms either of y or of 2. 
Equations are given for the log-relative potency and its confidence 
limits computed from parabolic log-dose response curves parallel in 2. 

A comparison of the log-relative potencies and_half-confidence 
intervals for the four Unknowns in each of the two assays showed 
relatively small discrepancies in the values computed (i) from parabolas 
fitted to the y and parallel in 2, (ii) from the non-linear y as if there 
were no curvature, and (iii) from the linear transforms z by the same 
equations. When curvature is smal] and a fully effective linear trans- 
form of the response is not available, the potency computed factorially, 
as if the response were linear against the log-dose, seems to have so 
small a bias that the resulting log-potencies and their confidence intervals 
would classify as “robust” statistics. 
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AN EVALUATION OF SOME STATISTICAL TECHNIQUES 
USED IN THE ANALYSIS OF PAIRED COMPARISON DATA 


J. EDWARD JACKSON AND Mary FLECKENSTEIN 


Eastman Kodak Company, Rochester, New York, U.S.A. 


The psychophysical method of paired comparisons has been tra- 
ditionally used to scale the responses to several stimuli within a limited 
response range. Thurstone’s method of statistical analysis of paired 
comparison data has been discussed in the literature and widely used 
for over 25 years. In the past five years several other methods have 
been proposed; it is the purpose of this paper to evaluate briefly and 
compare some of the various techniques and to illustrate the comparisons 
by means of an actual color preference test. The methods to be dis- 
cussed have been contributed by Thurstone (including some refinements 
by Mosteller) (Guilford [1954]), (Mosteller [195la, 1951b], Thurstone 
[1945]), Bradley and Terry (Bradley [1953, 1954a, 1954b, 1955], Brad- 
ley and Somerville [1955], Bradley and Terry [1952], Terry and Brad- 
ley [1952]), Scheffé [1952], Kendall [1947, 1948, 1955], Morrissey [1955] 
and Gulliksen [1956]. An attempt has been made below to maintain 
consistency in notation within our present paper rather than to agree 
with that of the authors cited. 


THE THURSTONE-MOSTELLER METHOD 


It is assumed that the responses r; (¢ = 1, --- , k) for each of the 
k stimuli to be compared are independent and have equal variances 
(Thurstone’s Case V) on a response scale defined to be the Subjective 
Continuum. The sampling distribution of any 7, is assumed to be normal 
and hence the distribution of any 7; — 7; is normal. If a simplification 
is applied which equates to unity the variance of r; — 1; , pi; , the 
estimated proportion of time that stimulus 7 is preferred to stimulus 9, 
is the area under the normal curve with normal deviate —(r; — 1;), 


that is, 


it i —27/2 
a= e dz. 
@ \/ Qn —(ri—ri) 
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Since the observed proportional preference of 7 over j, c;; , can be con- 
sidered an estimate of p;; , the response scale locations can then be 
estimated by setting 


(G6 — 
a/ 2a (Ri i) 


and solving for the quantity —(R; — R;). By this operation the matrix 
of preferences 


e” dx 


Row 
sum 
Ci1 Ci2 C1; Cik cr 
cd 
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: 
C= 
Ci Ci2 Ci; Cik Ds C3; 
7 
Cure (Gxa Ce Cry, ye Cig 
7 
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phy Ky, — Re ‘ins Riviy, aug ae ae Hees kip 
R, — Rk, R, — R, acta Rk. — RK; bie Rk, — R, 
Ree ; ; : ; : 
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R is a skew symmetric matrix with all diagonal elements equal to zero. 
Summing the first row in the R matrix and dividing by k yields 


kR, — i 5 
i Rp Ran. 


Similarly, the sum of the 7th row is 


Ry — R= 7. 
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Thus, all the r, can be determined in terms of normally distributed 
deviations about the mean # and plotted on the response scale. Bliss, 
Greenwood and White [1956] have modified this procedure by replacing 
the normal deviates with “rankits” (scores for ordinal, or ranked, data). 
Mosteller has developed a goodness-of-fit test in which the r; are 
used to predict the original c;;. The test is roughly a comparison of 
the observed c;; with the estimated p;; . Poor results in the goodness- 
of-fit test may be caused by many factors including the following: 


1. Occasionally, a sample c,; will be either 1 or 0, for which the cor- 
responding normal deviates are + and —«. Substitutes have been 
suggested for + © varying from +2 to +4 and for — © varying from 
—4 to —2, although there is no distinct preference for any one value. 
It is suggested that if one stimulus can be judged unanimously by all 
observers, it could be evaluated by a more simple test. However, an 
occasional paired comparison of this type may occur and, in using some 
substitute normal deviate, the goodness-of-fit test is affected adversely. 
2. A relationship may occur that is known as a circular triad (Kendall 
[1947, 1948]) when e.g. stimulus 7 is preferred over stimulus j, stimulus 7 
is preferred over k, but k is preferred over 7. Another version of the 
relationship, a secondary circular triad, involves a strong preference 
of 7 over j, a strong preference of 7 over k, but only a mild preference 
of iz over k. If there are very many circular triads in a single experiment, 
the goodness-of-fit test results will be unsatisfactory. Circular triads 
may arise from such causes as (a) inconsistent judgments on the part 
of an observer or observers, (b) actual interactions between observers 
and stimuli resulting either from improper design of the experiment or 
from a real subjective interaction, or (c) no real differences between 
the stimuli. 

3. Another possible reason for a poor fit is the invalidity of the assump- 
tion of equal variances for the responses to the stimuli. Guilford and 
Burros [1951] give a solution for estimating the variances for each 
response and obtaining adjusted response scale values. (Thurstone’s 


Case III). 


THE BRADLEY-TERRY METHOD 


In the method proposed by Bradley and Terry, ratings p; (¢ = 
1, --- , k) associated with each of the stimuli are estimated, such 
that the probability that stimulus 7 will be chosen over stimulus j is 


ag ey 
Pit = + Bi 
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Log p; corresponds to the response scale rating r; of Thurstone. The 
sample estimates, p; , are found by the solution of the following k 
nonlinear equations: 


so al ie lee eee 


il 
Di + Peo aera Ds a De 
: 1 Pa ee a <. —1_| 
p= Seu) ee ay se Peper 


1 1 Ms 
a Eon = Di Se + pz one Ty a 
where DS, ¢;; are row sums from the matrix C shown above. As these 
are nonlinear equations, an iterative solution must be used. A starting 
approximation for p,; is p?) = >|; ¢,,;/[(e — 1) — (k — 2).>0; ex). 
(Dykstra [1956]). 

Complete tables of p have been prepared for k = 3,n = 1, --- , 10; 
k=4n7 =1,:-+,8;6 = 557 = 1, ->~ 5. = number of observers): 
All possible combinations of the sums of ranks (sums of ranks are defined 
as n{2(k — 1) — >; ¢,;}) and the corresponding p; are tabulated. 
These tables also give the probability with which a particular combina- 
tion of sums of ranks will occur by chance alone. Included in the tables 
is the likelihood ratio statistic 6, used in significance tests analogous 
to analysis of variance tests. Differences between stimuli and inter- 
actions between observers or replication and stimuli may be tested. 
Bradley [1955] has developed a procedure for testing a subset of the 
stimuli against the remaining stimuli by means of the sign test. Pro- 
cedures for obtaining confidence limits for p; , log p; , p; + p; , and 
log (p; + p;) are described by Bradley [1955], Bradley and Somerville 
[1955], and by Hald [1952]. 

Bradley and Terry have also described a goodness-of-fit test in 
which the p; are used to predict the original proportions for each pairing. 
Poor fits result from the same factors as those listed for the Thurstone- 
Mosteller method. An individual pairing in which a unanimous decision 
is made in favor of one stimulus does not present the computational 
problems of the Thurstone-Mosteller method. However, if one stimulus 
is never chosen, it will receive a rating of p = 0 and will not appear on 
the response scale at all. A stimulus that is always preferred will 
receive a rating of p = 1, while all the other stimuli will receive ratings 
of p = 0, eliminating the response scale entirely. However, the pre- 
ferred stimulus could be deleted from the analysis and a response scale 
obtained from the remaining stimuli. 
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THE SCHEFFE METHOD 


Scheffé’s method is essentially an analysis of variance using scored 
data. Moreover, there are two basic differences between his model 
and those previously discussed. 


1. Scoring systems are used by which the stimuli are rated according 
to the degree of preference rather than by a preferential selection. 
The following seven-point scale illustrates the scoring of stimulus 
2 when compared to stimulus j: 


l| 


observer prefers 7 to 7 strongly. 

= observer prefers 7 to 7 moderately. 
= observer prefers 7 to 7 slightly. 
observer has no preference. 

= observer prefers j to 7 slightly. 

= observer prefers 7 to 7 moderately. 
= observer prefers 7 to 7 strongly. 


WoNrR OF W WwW 
ll 


The experimenter may have as many or as few degrees of preference 
as seems advisable. However, as Scheffé points out, a too finely 
divided scale will produce distortion in the results The use of 
only +1 and —1 scores reduces the model to that of methods pre- 
viously discussed. 


There is some question about use of the 0 category, since it gives 
the observer an excuse to make no decision. If the 0 category were 
omitted and no real difference existed between stimulus z and stimulus 
j, then, by chance alone, 50% of the observers would be expected 
to record a preference for each stimulus anyway. 


2. The experiment is completely replicated with the order of presenta- 
tion reversed in the second replication. That is, if stimulus 7 is 
observed before stimulus 7 the first time, 7 is presented after stimulus 
j the second time. Order of presentation, although not an important 
factor in some experiments, has an extreme influence on the results in 
taste tests and similar experiments. For each pair in the exper- 
ment to be judged by n different observers, nk(k — 1) observers are 
required, although fewer can be used if the experiment is properly 
balanced. In the example which follows, the entire set of pairs 
were judged in each order by n observers. 


The sources of variation and their corresponding degrees of freedom 
for an analysis of variance with k stimuli and 2n observers are as 


follows: 
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Source of variation Degrees of freedom 
Main effects k-—1 
Deviations from subtractivity CQ) Dkie — 3) 
Average preferences (1/2)k(k — 1) 
Order effects (1/2)k(k — 1) 
Means k(k — 1) 
Error k(k — 1)(n — 1) 
Total nk(k — 1) 


The main effects refer to differences between stimuli as in a simple 
analysis of variance. Deviations from subtractivity, comparable to 
an interaction term, is analogous to the goodness-of-fit tests in the 
methods previously discussed. These two sources of variation 
comprise the average preferences. The remainder of the analysis 
is concerned with the effect of order of observation on the preferences; 
in fact, a single degree of freedom can be isolated from the Order 
effects for an average order effect. The Hrror term is obtained in 
the usual way, by subtraction, and represents the variability un- 
explained by differences between stimuli, the effect of order, and 
their interactions. 


Estimates of the response scale ratings a; (corresponding to r, 
and log p,; of the other methods) are the average scores for each stimulus. 
As in the methods previously discussed, p;; , the preference of stimulus 
7 over stimulus j, can be estimated. If s? is the error mean square then 

Ck — Ox 
8. 
is distributed as ¢ with k(k — 1)(n — 1) degrees of freedom, and p,; 
may be found directly from tables of ¢ (Pearson and Hartley [1954], 
Scheffé [1951]). Procedures for establishing confidence limits on the 
quantities a; — a; are presented in this method (Scheffé [1952]). 


THE WORK OF M. G. KENDALL 


One of the pioneers in the statistical methodology of ranked data 
is M. G. Kendall. Among his proposed techniques, two that are appli- 
cable to paired comparison data are the calculation of the coefficient of 
agreement, a measure of agreement among observers, and of the co- 
efficient of consistency, a measure of the consistency of an individual 
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observer. Similarity of observer judgment over the whole experiment 
is measured by the coefficient of agreement. Agreement on the part of 
all observers about all pairs is the perfect case producing a coefficient 
of agreement of 1.0. The coefficient of consistency is calculated by obtain- 
ing the ratio of the number of circular triads of an observer to the total 
possible number and subtracting the ratio from unity. The secondary 
circular triad has no effect on the coefficient or on the significance test. 
High values of either coefficient indicate consistent observers or wide 
separation of stimuli. Low values denote observer inconsistency or the 
lack of real differences among stimuli. Kendall also describes tests to 
determine significance of the coefficients. 

In a recent article Kendall [1955] presents a technique applicable to 
the incomplete case; the rating scale is estimated by an iterative pro- 
cedure which involves powering of the frequency matrix. 


THE MORRISSEY-GULLIKSEN METHOD 


Mosteller has shown [195la] that the method of Thurstone has a 
least squares solution. Morrissey and Gulliksen have developed 
independently a technique to provide a similar solution for the case in 
which not all the comparisons are made. Morrissey has specifically 
applied the technique to experiments in which marked differences exist 
between the extremes of the stimuli. The method is applicable to tests 
in which there are missing data or meaningless comparisons. Care 
should be taken in the choice of pairs, however, in order that the experi- 
ment will remain reasonably balanced. 

In the analysis of the data Morrissey lists all of the observed pairs 
in a (g + 1) X k matrix in such a manner that each row specifies a 
pair, 1.e., 


1 2 7 k 

Onin iands 2 |b 1 + + 0, ee 0 

pair 1 and j 1 One ele eae 0 

pair 1 and k 1 Os = WE Ose 
=X 

paged 00 eet 0 

pair j and k 0 () nei ie eh 
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where g is the number of pairs observed and k is the number of stimuli 
in the test. 


Each of the first g rows represents one of the included pairs; the last 
row consists of ones in each column, added to transform X’X into a 
nonsingular matrix. Corresponding to this matrix is a (g + 1) X 1 
column vector d whose ith element is the normal deviate for the pro- 
portion preferred of the pair represented by the 7th row of X. A zero is 
placed in the last row of d corresponding to the last row of X since this is 
the mean of the normal deviates. The k X 1 column vector correspond- 
ing to the r, of the Thurstone-Mosteller method is r, calculated by 
solving Xr = d using the least squares criterion. The quantity (d — Xr)’ 
(d — Xr) is minimized by the solution 


TeX Xe ed 
The standard error of estimate is given by 


(Ge seae= eS). 
y= (oe 


Mosteller’s goodness-of-fit test with adjusted degrees of freedom 
is applicable to Morrissey’s method. Gulliksen’s solution is identical 
but employs an iterative procedure which reduces computation time 
considerably. 


ILLUSTRATIVE EXAMPLE 


A color-balance aimpoint is a quality control standard toward 
which photographic color processing is directed to obtain the most 
acceptable array of colors for a color print. A picture with optimum 
color balance has neutrals that look neutral (are not tinted with some 
particular hue), and the other colors depart consistently from this 
neutral reference. The standard is produced by a combination of film 
sensitivity, color filters over the camera lens, and processing character- 
istics. Color balance can best be represented by a trilinear plot as 
illustrated in Fig. 1. The points on the grid represent two gradients 
of filter concentration for a particular hue that produce visually graded 
saturation steps. 

Normal is the first approximation to neutral color balance and is 
usually established by preliminary tests involving the method of single 
stimuli. The true aimpoint falls somewhere in the gamut of colors 
represented in Figure 1, and is determined by appropriate use of the 
method of paired comparisons. A color gamut can be produced by 
printing a normal picture plus 12 more pictures of the same scene, each 
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FIGURE 1 
Tue Tri-Lingar Meruop or SPECIFYING COLOR BALANCE 


exposed with a different color filter. The picture printed with the 
Number 2 green filter was rejected by the method of single stimuli. All 
of the remaining pictures were acceptable to the average observer, but 
by means of the paired comparison technique, because of its powerful 
discrimination properties, the stimuli could be scaled. Each pair of 
transparencies was projected side by side, and the observer indicated 
on a scoresheet whether he preferred the print on the left or on the 
right and whether this preference was mild or strong. The two pro- 
jectors were matched to give the same illuminant quality since varia- 
tion in optical systems on different projectors may mask the differences 
in the transparencies. The pairs were reversed as far as the projectors 
were concerned for the second replication to fulfill Scheffé’s requirement 
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about order. To illustrate the Morrissey-Gulliksen method, only 
adjacent pairs were used, reducing the number of pairs from 66 to 31. 


Analysis of the data 


Kendall’s test of consistency was computed for each observer as 
an initial step in the investigation. Although there was some varia- 
bility in the coefficients of consistency from observer to observer, all 
were significantly high. Since none of the observers was sufficiently 
inconsistent to warrant his exclusion from the experiment, all of the 
data were analyzed by the methods of Thurstone-Mosteller, Bradley- 
Terry, Scheffé, and Morrissey-Gulliksen. Only in Scheffé’s analysis 
were rated preferences and the order of presentation taken into account. 

Figure 2 illustrates the response scales (log p,) for each replication 
and for the total experiment as calculated by the Bradley-Terry method. 
There was no significant difference between replications. The response 
ratings are listed in Table 1. 
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FIGURE 2 
RESPONSE ScALES, BRADLEY-TERRY METHOD 
(B, C, G, M, R and Y denote Blue, etc. color filters listed in Table 1.) 


Figure 3 exhibits the similarity of the response scales for the total 
experiment as evaluated by all four methods. Even stimuli which 
are not significantly different are similarly scaled. There is perfect 
agreement in the rank of seven of the twelve stimuli; four more differ 
by only one place, and only one stimulus rank differs by two. For 
example, the proportion of the time that Red 1 would be preferred over 
Blue 2 as estimated by the different methods is: 


Di 
Bradley-Terry 0.621 
Thurstone-Mosteller 0.610 
Scheffé 0.616 


Morrissey-Gulliksen 0.602 


PAIRED COMPARISON DATA 


TABLE 1 
Brapiey-Trerry Response Ratrnes (p;) 


Stimulus Replication Replication 

I II 
Normal 0.088 0.098 
1 Red 0.168 0.153 
2 Red 0.130 0.171 
1 Green 0.030 0.033 
1 Blue 0.152 0.124 
2 Blue 0.119 0.081 
1 Cyan 0.053 0.065 
2 Cyan 0.010 0.007 
1 Magenta 0.134 0.117 
2 Magenta 0.035 0.052 
1 Yellow 0.060 0.062 
2 Yellow 0.021 0.038 


Table 2 compares the probabilities for four such pairs. 
deviation, 0.07, occurs between the response scale values for the complete 
methods and for the Morrissey-Gulliksen method, but it must be 
remembered that the incomplete method in this case uses only 47% 


of the pairs. 

LEAST PREFERRED BRADLEY — TERRY 
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Total 


0.093 
0.162 
0.152 
0.031 
0.138 
0.099 
0.059 
0.009 
0.126 
0.043 
0.061 
0.029 
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FIGURE 3 
CoMPARISON OF RESPONSE SCALES 


(B, C, G, M, R and Y denote Blue, etc. color filters listed in Table 1.) 
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TABLE 2 
PREFERENCE RATINGS 
1 Red over | 1 Red over | 1 Red over | 1 Red over 
Method 2 Blue 1 Blue 1 Magenta 1 Cyan 
Pij Pii Pij Pii 
Bradley-Terry 0.621 0.540 0.562 0.733 
Thurstone-Mosteller 0.610 0.540 0.544 0.729 
Scheffé 0.616 0.547 0.549 0.727 
Morrissey 0.602 0.554 0.611 0.700 


As far as the goodness-of-fit tests are concerned, only the Bradley- 
Terry ratings predicted the original data sufficiently well. The pre- 
diction failures in the Thurstone-Mosteller and Morrissey-Gulliksen 
methods were due conceivably to the incidence of several unanimous 
decisions, or to the inequality of the variances contrary to the assump- 
tion. The significance of the Deviations from subtractivity mean square 
in Scheffé’s analysis of variance indicated that the fit was not good, 
although the error term was quite small and had nearly 2000 degrees of 
freedom associated with it. 


COMPARISON OF THE METHODS 


In the illustrative example, the response scales are almost identical. 
This has been the case in all similar comparisons made by the authors, 
not only in studies in the field of color photography, but also in a few 
experiments in unrelated fields. When, as in this instance, the response 
scales are almost identical, the choice of method will depend on other 
factors. Scheffé’s method, of course, has the advantage of allowing 
for graded degrees of preference. His model provides a method for 
testing the order of observation, although an order test could be made 
in the other methods by using two replications. The Bradley-Terry 
method, with a minimum of assumptions, covers the experiment most 
completely by means of significance tests and confidence limits, although 
the other methods could be modified to obtain similar results. 

The Thurstone-Mosteller method is by far the easiest with respect 
to computing. Morrissey’s method involves the inversion of ak & k 
matrix the first time a particular design is used, but otherwise is much 
like the Thurstone-Mosteller procedure. However, Gulliksen’s pro- 
cedure is much faster. The labor involved in calculating Scheffé’s 
solution is comparable to a more complicated analysis of variance. 
Calculations for the Bradley-Terry method can be made rapidly for 
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k = 5or less because of the availability of tables. Fork > 5, the solution 
of & non-linear equations in k unknowns as well as the solution of B, 
must be obtained. Dykstra’s starting approximations provide very 
good estimates of the p; , however, and only three iterations were 
necessary in the example to obtain all values of p; to .001. 


SUMMARY 


Although all four methods give approximately the same response 
scales, each has advantages for specific situations. If more efficient 
use of the data will result from inclusion of graded degrees of preference, 
Scheffé’s method should be used. Scheffé’s model provides a method 
for estimating and testing order of presentation. For a complete 
analysis of a paired comparison experiment, the procedures proposed by 
Bradley and Terry are most effective. For routine work in which the 
primary interest is the response scale, the Thurstone-Mosteller method 
is preferable because of the ease in computation. The Morrissey- 
Gulliksen method is useful in reducing the size of an experiment or in 
eliminating vastly different pairs. 
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For many years research entomologists have realized that insect 
populations have a tendency to be heterogeneously distributed. This 
heterogeneity is present in the distribution of larval populations of 
the European corn borer, Pyrausta nubilalis (Hbn.). To the research 
entomologist conducting experiments on larvae of the corn borer an 
understanding of this heterogeneity, or ‘‘spottiness’’, in the distribution 
of the insect would be of help in the design of experiments. 

Distributions may be fitted for either one of two reasons: (a) to 
find a transformation in order to use normal theory, for example, in 
performing an analysis of variance (here it does not matter if the form 
of distribution fitted is particularly accurate); or (b) in relating observed 
counts to some theory of population growth or spread (requiring forms 
of distributions that are biologically significant.) 

The purpose of this paper is to report the results of fitting three 
biologically significant statistical models, the negative binomial, Ney- 
man’s type A and the Poisson binomial to entomological field data. 


REVIEW OF LITERATURE 


Much has been written on the general application of contagious 
or compound Poisson distributions to plant and insect populations. 
For the purposes of this paper, however, only references that are con- 
cerned with the European corn borer or the Poisson binomial distribution 


have been included. 
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Simanton ef al. [1931] pointed out that they had never encountered 
homogeneous corn borer populations at any time. They said, ‘‘It seems 
more evident that corn borer populations are naturally spotty. A 
condition that permits large and erratic variations of adjacent plot 
populations from each other and from their mean.” 

Beall [1940] proposed to explain the observed heterogeneity in 
corn borer populations by fitting three mathematical distributions due 
to Neyman [1939] and two due to Pélya [1931] to four frequency distri- 
butions obtained by sampling plots treated alike in an experiment 
designed to test the efficacy of controlling the European corn borer 
with the parasitic fungus Beauvaria bassiana (Bals.). 

In fitting the distributions Beall used the method of moments to 
estimate the parameters. First, he fitted the Neyman type A distribu- 
tion to the observed frequency distributions and found that only one 
of them was well fitted by it. He then proceeded to smooth each of 
the four observed frequency distributions by eye and to the four 
smoothed distributions he fitted all five mathematical distributions. 
Of these smoothed distributions he found one to be fitted best by the 
Neyman type B, two by the Neyman type A, and one by the negative 
binomial. 

Bliss [1953], using Beall’s observed data and estimating the param- 
eters of the distributions by the method of maximum likelihood, con- 
cluded that the negative binomial was the appropriate mathematical 
distribution for fitting corn borer data. 

Finally, Evans [1953], again using Beall’s original data, and esti- 
mating the parameters of the distributions by both the method of 
moments and the frequency of zeros and ones, and in some cases also 
by the method of maximum likelihood for the negative binomial, agreed 
with Bliss on the negative binomial as the correct distribution for 
fitting corn borer data. 

As is evident from the above review of the work all conclusions as 
to the appropriate mathematical distribution are based on Beall’s data. 
Since the empirical frequency distributions obtained by Beall were 
from non-contiguous plots and were based on a small number of sampled 
observations, it was felt that his data might have given only an approxi- 
mate answer to this question of the appropriate distribution. Therefore, 
it was thought worthwhile to get uniformity data from more extensive 
experimental areas and to re-examine the fitting of the more important 
contagious distributions. 

All of the distributions referred to are known, in general, as con- 
tagious distributions. Biologically speaking, the name is unfortunate. 
In the present case, contagion is only apparent (Feller [1943]) and 
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may be defined as an inflation of the variance in a population which 
should be described by a Poisson distribution. In this instance it 
therefore indicates the condition of “spottiness” previously discussed. 
A more adequate name would be compound Poisson, as used by Feller 
[1943], since, in general, they consist of a Poisson variable compounded 
in a Poisson or Gamma distribution, or a binomial variable compounded 
in a Poisson distribution. In the authors’ opinion the name compound 
Poisson is more descriptive than contagious. 


METHODS 


During the summer of 1952, all of the 3205 corn plants growing on 
an area located in Northwest Iowa were dissected. During the summer 
of 1953, three additional areas were sampled by dissecting one plant, 
selected at random, from each hill. 

Each area consisted of approximately 4 acre chosen from a different 
corn field. An attempt was made to select it from parts of the field 
on level ground so as to make it as uniform as possible. All four areas 
were square in shape and included 36 rows of corn. The field sampled 
in 1952 had 54 hills per row at intervals of 2.3 feet while the three 
sampled in 1953 had 36 hills per row at intervals of 3.5 feet. Hach 
area was divided into 324 plots and a coordinate system was employed 
to specify the exact location of each plant in the area. The dissection 
of the plants in the 1952 area was completed in 4.5 days. In 1953 the 
plants were dissected in from 1.25 to 2.5 days per area. 

In all cases the data consisted of the number of cavities, and the 
number of borers by stages, for each plant dissected. All information 
was punched on I.B.M. cards and the frequency distributions used in 
this paper were obtained from the cards. The original data are the 
property of the Iowa State Experiment Station and have been issued 
under the title of “Uniformity Data from European Corn Borer, Py- 
rausta nubilalis (Hbn.), Populations’ as Mimeo Series No. 2, Statistical 
Laboratory, Iowa State College. 


FITTING THE DISTRIBUTIONS 


In this paper three different compound Poisson distributions are 
fitted to 9 observed frequency distributions from 4 sets of sampling 
data, (see Appendix). The three distributions are the negative binomial, 
the Neyman type A, and a distribution here called the Poisson binomial. 
This was first suggested by Neyman [1939], while Skellam [1952] 
fitted it to some plant data by expanding the probability generating 


function. 
For insects in the field the assumption made by Neyman that the 
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survival from each egg mass follows a Poisson distribution is somewhat 
stringent. In actuality the value of p for the European corn borer is 
in the range of 0.01. to 0.05, which although low does not approach 
sufficiently close to zero for the Poisson distribution to be more efficient 
than the binomial itself. In the present study the value of the parameter 
n in the Poisson binomial depends on the number of larvae surviving 
from each egg mass, at the time of dissection. A similar statement 
was made by Neyman [1939, page 39]. However, n does not have to 
be integral as long as the values of zero and one, which are points of 
discontinuity, are excluded, but by allowing non-integral values the 
biological significance as developed in Neyman’s model is lost. Mathe- 
matically the Poisson binomial is quite flexible; as m — © it can be 
shown to approach the Neyman type A distribution, and if0 <n < 1 
as n — 0 it comes closer and closer to the negative binomial. As a 
purely graduating curve it should be quite versatile. Recurrence 
formulae for calculating the probabilities are presented in this paper 
for the first time. Fisher’s [1953] method of maximum likelihood was 
used in fitting the negative binomial while the method of moments 
was used for the other two distributions. 

Beall has discussed the fitting of the Neyman type A and Bliss that 
of the negative binomial, so that it is not deemed necessary to describe 
those fitting procedures here. The fitting of the Poisson binomial, 
however, is explained. 

The probability generating function for the Poisson binomial is: 


(1) G(@) = exp {al(q + pé)” — 1]} 

from which the recurrence expressions for the distribution are obtained 
by setting 6 = 0 in 

(2) d°G(6)/de* where dG(0)/de° = G(é). 

The recurrence expressions are: 


OQ Rew 


ee ett ct — 1 [z=—t— CY ao ba tes 
OWN ae OS ( ; Jen = 1 IP oc 0, 
where (n = 1s = land (n — 1)! = (nm — 1)(m — 2)(n — 3), Pis 
the probability of a given x (cx = 0, 1, --- , ~) in the population and 
a, p, g and n are parameters of the distribution. 


It has been found that the labor of fitting is lessened if the recurrence 
formulae are left in the form 


Oy aiks = tere 
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z—1 


: al — 
(6) F. = pl Ds é ; Jen as tks i hy emt ieme ; x > 0, 
i=0 
and the probabilities are obtained from the relation P, = Foo 
The moments of the distribution are: 


Bi = apn 
(7) H2 = apn[l + (n — I)p] 
Ms = apn{l + 3(n — 1)p + (n — 1)(n — 2)p’]. 


The parameters of the distribution may be estimated by the method of 
moments using the relations, 


Fe eee 4 Se) 
ns—@)’ "~~ @—Dz 


(8) and q aad dl ae Pp); 

where is the sample estimate of the mean and s’ is the sample estimate 
of the variance. For clearness in the biological interpretation of the 
distributions only integral values of n greater than 1 were considered 
and, since with an increase in m the Poisson binomial approaches the 
Neyman type A, in order to get the maximum difference between the 
two curves the value of n to be used in the distribution was determined 
from the relation for estimating p in (8) above. The authors chose the 
smallest value of n such that p = 1. Neither maximum likelihood nor 
minimum chi square methods of fitting the parameters a and p are 
known under these conditions. In general if non-integral values of 
n are to be allowed the criterion 


” ae 


as described by Skellam [1952] may be used as an aid in determining 
which type of distribution to fit, and for the Poisson binomial n may 
be estimated by 


R-2 
(10) Ue pees 


In Equation (9) above K,,; is a factorial cumulant which may be derived 
from the factorial moments in the same manner as the cumulants are 
derived from the moments. In general, a useful value of n may be 
expected to be between 2 and 4 since the Poisson binomial approaches 
the Neyman type A quite rapidly. 

Distribution 9, Table 1, has been chosen as an example of the 
actual fitting of this distribution. 
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The first two moment estimates, using six decimals for all computa- 
tion which may be rounded off, are € = 0.648 148 and s° = 0.845 334, 
hence, substituting in Equations (6) with n = 2, 


ad = (n — 1)#’/n(s’ — &@) = (0.648 148)’/(2) (0.197 186) = 1.065 227, 
6 = (s° — #)/a(n — 1) = (0.197 186)/(0.648 148) = 0.304 229 
Qo = Ci po) = 0.695 Fil. 
In order to compute P, the value of ¢” is needed. 
G° = F = (0.695 771)’ = 0.484 097. 


Since (s”’ — «)/@ is less than one we may allow n to be 2. With n = 2, 
only two multipliers are required. These are 


g = 4 = 0.695771 and (mn — 1p"? = (pl) = 0.304 229. 


Using the above values for the parameters and multipliers, the 
required F, values are next calculated to be 


Hop neni = 6 a eee 80.57 10a 
F, = £¢° ‘Fy = (0.648 148)(0.695 771)(0.577 207) = 0.260 299, 
F, = €[m — 1)pq°°Fo + °F] 
= (0.648 148) [(0.304 229)(0.577 207) + (0.695 771)(0.260 299)] 
= 0.231 202, 
F; = ¢[2(n — 1)pg °F, + GF] 
= (0.648 148)[2(0.304 229)(0.260 299) + (0.695 771)(0.231 202)] 
= 0.206 918 
etc. Hence, the required probabilities are: 
Py = F,/0! = 0.577 207, 
P, = F,/1! = 0.260 299, 
P, = F,/2! = 0.115 601, 
P; = F;/3! = 0.034 486 


and so on. The expected frequencies are obtained by multiplying the 
probabilities in turn by the total frequency (324). 


RESULTS 


The results of fitting the three mathematical distributions to three 
observed frequency distributions from the 1952 field and to two fre- 
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quency distributions from each of the three 1953 fields are given in 
Table 1. 

In most cases the difference between the various fitted distributions is 
slight, and in some cases, for practical purposes, no distinction would 
be made between two or even all three of them. Had more efficient 
methods of estimation, consistent with the selected tests of goodness- 
of-fit, been available, the Poisson binomial distribution might perhaps 
have been fitted more closely, but even using the above estimating 
procedures, this model gave on the whole closer approximations to the 
data than did the negative binomial, as measured by an ordinary 
chi-square goodness-of-fit test. 

All three compound Poisson distributions were fitted to the nine 
observed distributions. In six cases the Poisson binomial gave the 
best fit. The negative binomial fitted best in three cases (Table 1). 
The fitting indicates that for these data the Poisson binomial and the 
negative binomial are the appropriate graduating curves in that order. 
An interesting question presenting itself is whether or not sampling a 
population of corn borer larvae enhances the skewness of the resulting 
frequency distribution and thus makes the negative binomial the 
appropriate distribution. 


DISCUSSION AND SUMMARY 


It has been shown that data from an area on which total larvae 
counts of the European corn borer, Pryausta nubilalis (Abn.) made, 
giving Distributions 2 and 3 (see Appendix), were best fitted by the 
Poisson binomial distribution. On the other hand when the corn borer 
population was sampled, as was done in the areas from which Distri- 
butions 4 to 9 (see Appendix) originated, the appropriate distribution 
was the negative binomial, provided the mean was above one. However, 
when the mean was less than one, it seemed that sampling made little 
difference and the best fitting distribution was again the Poisson bi- 
nomial. 

Completely random samples from a population of known distribution 
will tend to reproduce the parent distribution. Why then would sampling 
make so much difference in the observations from compound Poisson 
populations? A possible explanation may be deduced from the assump- 
tions underlying the three distributions fitted. The assumption under- 
lying the negative binomial is that there are many values of a Poisson 
mean which are distributed as a modified chi-square distribution. On 
the other hand the assumptions underlying the Neyman type A and 
Poisson binomial distributions are; (1) that eggs are laid in masses of 
N eggs each, (2) that these egg masses are laid at random over the field, 
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(3) that the hatching larvae will move away from the location of the 
mass equally in all directions, (4) that the number of survivors from 
each egg mass follows either a Poisson or a binomial law, and finally, 
(5) that there is only a limited distance (small relative to the size of the 
plot) over which any larva move from the egg mass. It seems logical 
to assume therefore that the important factor in both the Neyman 
type A and the Poisson binomial distributions is the area over which 
the larvae from a certain number of egg masses will interact. By 
taking all the information available, a complete census, this area inter- 
action is reflected in the type of distribution obtained. When a sample 
of one plant per hill is taken only the varying Poisson populations are 
measured by the sample, the area interactions may not be observed 
and the resulting distribution is therefore closer to the negative binomial 
than to the Neyman type A or Poisson binomial distributions. 

One other point should be discussed. If this sampling favors the 
negative binomial, why should a distribution with a low mean be fitted 
best by the Poisson binomial if the larval population was sampled? 
In the cornfield with a low larval population the infested hills will be 
found in localized areas with many surrounding non-infested hills. If 
one plant from each hill is dissected, as was done in the case of the three 
areas studied in 1953, then the sample is sufficient to show the true 
distribution of the insect since the larvae will tend to remain on the 
plants in a hill. In other words, less information is lost in thus sampling 
from a population with a low larval density than in sampling from a 
population with a high larval density. The area over which the larvae 
travel may vary directly with the population density. 

In summary, then, it has been shown for the data analyzed that with 
complete information on the corn borer population in a large area of 
corn, at least } acre, the appropriate mathematical distribution to 
describe the population of larvae seems to be the Poisson binomial. 
When, on the other hand, the information was incomplete, based on 
a restricted random sample, the appropriate distribution seems to be 
the negative binomial for fields with a high density and the Poisson 
binomial for fields with a low density. It must be pointed out, however, 
that even for the sampled fields with the low densities the negative 
binomial may be an excellent approximating distribution. 
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APPENDIX 


Frequency Distributions Observed and Computed from Fitted Negative Binomial 
(NB), Neyman Type A (NTA) and Poisson Binomial (PB) Functions 


Observational units as specified in Table 1. Brackets indicate groupings adopted for goodness of fit 
tests. 


DISTRIBUTION 1 


We—3209 
Count per Observed Theoretical frequency 
plant frequency 
NB NTA PB (n = 2) 
0 355 824.30 331.79 341.84 
1 600 660.37 654.44 644.37 
2 781 734.06 734.77 728 .03 
3 567 610.45 608 . 67 609.14 
4 441 408. 82 411.61 415.60 
5 245 236.54 239.64 242.72 
6 135 122.49 124.09 125.17 
7 42 58.12 58.44 58.20 
8 17 25.68 25.45 24.76 
9 11 10.70 10.41 9.75 
10 iil 4.47 5.69 5.42 
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DISTRIBUTION 2 
Wi = 3205 
Count per Observed Theoretical frequency 
plant frequency 
NB NTA PBiGy =) 2) 

0 588 553.18 568.14 587.85 

1 807 845.72 829.36 799.38 

2 741 750.81 742.53 741.12 

3 479 506.58 509.95 515.08 

4 328 287 .83 293.57 299 .67 

5 159 145.14 148.37 150.76 

6 67 67.01 67.71 67.74 

a 22 28.89 28.43 27.64 

8 5 11.80 aE besa le} 10.39 

9 if 4.61 4.11 3.63 

10 te 2.76 1.92 1.74 

DISTRIBUTION 4 
N = 1296 
Theoretical frequency 
Count per Observed 
plant frequency NB NTA PB (n = 2) 

0 423 426.60 419.40 461.53 
1 414 406.36 405 .33 349.79 
2 253 248.65 257.06 259.29 
3 117 123.91 128.39 129.54 
4 53 54.71 54.70 60.14 
5 22 22.30 20.75 23.34 
6 4 8.58 7.15 8.45 
7 i 5 yc Alka) 2.29 2.75 
8 3 1.13 0.69 0.84 
9 | 2 0.60 0.27 0.32 


76 BIOMETRICS, MARCH 1957 


DISTRIBUTION 3 


NT == Sic 

Count Theoretical frequency |Count Theoretical frequency 

per | Observed per | Observed —_———. 

plot | frequency) NB |NTA|PB (n = 4)} plot frequency] NB |NTA|PB (n = 4) 
0 0 0.00} 0.01) 0.06 30 8  |11.08/11.37) 11.53 
1 0 0.00) 0.03} 0.03 31 11 10.24/10.58) 10.74 
2 0) 0.02) 0.08) 0.12 32 6 9.39) 9.95) 9.91 
3 0 0.05) 0.16} 0.27 Be ee 8.54, 8.91] 9.07 
4 OP)012 170220) 0837 34 9 | 7:72) 8.07] 8.22 
5 1 0.26) 0.50) 0.55 Bo 5 6.92) 7.26] 7.39 
6 1 0.49) 0.80} 0.97 36 tee 6.17] 6.47] 6.59 
7 1 0.84) 1.20) 1.34 Su ut 5.47] 5.73] 5.83 
8 0 Th at 7A We 38 { 7 4.82) 5.03} 5.12 
9 5 2.01) 2.36) 2). 47 39 5) 4.22) 4.39) 4.46 
10 2 2.85} 3.12} 3.26 40 4 3.68] 3.81] 3.85 
11 { 5 3.84, 4.01} 4.02 41 1 So oe 2s eeeot 
12 6 4.97| 4.99} 4.98 42 5 2.75) 2.80] 2.82 
13 { 8 6.21) 6.06} 6.08 43 2 2.36) 2.38] 2.38 
14 3 7,50) 67. 181" G.93 44 3 2.02} 2.01) 2.00 
16} 7 8.81] 8.32} 8.18 45 0 1272) 1.69) “167 
16 15 /10.08/ 9.44) 9.98 46 1 1.46) 1.41 E39) 
ie 12 /11.25/10.50} 10.27 47 0) Way] le ilydl ik Te 
18 10 = //12.30/11.48] 11.99 48 2 1.03} 0.96} 0.94 
19 10 //13.19]12.35! 12.16 49 0 0.37| 0.79] — 0.76 
20 12 /13.89]13.07/ 12.87 50 1 0.72) 0.65} 0.62 
21 11 _/14.38]13.63} 13.43 51 0 0.60; 0.53] 0.50 
22 14 14.67/14.01| 13.87 52 i 0.50) 0.42/ 0.40 
23 ILS) TAS 75i14 227) faa 53 0 0.41/ 0.34 0.39 
24 13 |14.64}14.24] 14.18 54 0 0.34) 0.27; 0.25 
25 17 = {14.35/14.09] 14.08 55 0 0.28) 0.22) 0.20 
26 17 |13.90/13.79) 13.84 56 1 0.23) 0.17) 0.16 
27 LO 1373313 85! 13.49 57 0 0.19} 0.14} 0.12 
28 15 |12.65/12.78] 12.88 58 0 0.15) 0.11] 0.09 
29 18> 2089/1211) 1298 59 1 0.12) 0.08! 0.07 
60+ 0 0299)°3.34| 2.85 
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DISTRIBUTION 5 


a 


N = 324 
Count per Theoretical frequency 
plot Observed frequency 
NB NTA PB (n = 3) 
0 10 8.92 12.41 16.20 
1 18 22.96 23.14 19.87 
2 39 BO 33.50 33.34 
3 33 42.32 39.74 38.43 
4 42 43 .34 41.41 40.46 
5 56 39.92 39.16 39.27 
6 36 34.01 34.28 34.54 
7 26 23h 28.17 28.74 
8 19 20.92 21.94 22.51 
9 19 15.42 16.32 16.74 
10 if 11.02 11.65 11.93 
11 4 7.66 8.03 8.15 
12 4 5.21 5.06 5.37 
13 4 3.47 3.47 3.42 
14 2 2.27 2.19 Payal 
15 1 1.47 1.35 I Par 
16 2 0.94 0.81 0.74 
17 1 0.59 0.48 0.42 
18 0 0.20 0.28 0.23 
19 0 0.12 0.16 0.13 
20 0 0.08 0.09 0.07 
21 0 0.05 0.05 0.04 
22 0 0.03 0.03 0.02 
23 0 0.02 0.01 0.01 
24 0 0.01 0.01 0.004 
25 if 0.41 0.01 0.004 
DISTRIBUTION 6 
N = 1296 
Count per Theoretical frequency 
plant Observed frequency 
NB NTA PB (n = 2) 
0 907 902.85 900.89 904.44 
1 275 288.86 288.76 279.42 
2 88 78.07 82.00 89.09 
3 23 19.81 19.34 18.63 
4 3 6.42 5.02 4.31 
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DISTRIBUTION 7 


ee a a ee 


N = 324 
Count per Theoretical frequency 
plot Observed frequency 
NB NTA PB (n = 2) 
(0) 89 89.53 89.01 100.97 
i} 96 91.85 88.63 69.66 
2 57 64.34 66.29 72.09 
3 44 38.10 40.41 38.69 
4 16 20.49 21.53 23.83 
5 11 10.36 10.38 10.65 
6 df 5.01 4.62 5.01 
di 3 2.34 1.93 1.94 
8 1 1.98 1.20 1.16 
DISTRIBUTION 8 
N = 1296 
Count per Theoretical frequency 
plant Observed frequency ———_—— 
NB NTA PB (n = 2) 
0 1117 1114.98 1116.97 1116.39 
1 149 154.51 150.44 150.47 
2 2 22).51 24.75 26.21 
3 3 3.99 3.84 2.92 
DISTRIBUTION 9 
Neo 
Count per Theoretical frequency 
plot Observed frequency ee 
NB NTA 1B} (ei ) 
0 188 185.79 187.99 197.02 
1 83 89.28 85.02 84.34 
2 36 32.99 34.52 87.45 
3 14 10.97 11.65 i ale 
4 2 3.45 3.51 3), hil 
5 il 1.52 1.30 0.91 


AN APPLICATION OF STOCHASTIC PROCESSES TO EXPERI- 
MENTAL STUDIES ON FLOUR BEETLES*,** 
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Division of Biostatistics, School of Public Health, University of California, Berkeley, 
UniSmAre 


1. INTRODUCTION 


This study is concerned with a general stochastic model of population 
growth in experimental studies on flour beetles. The population under 
consideration is that of eggs of flour beetles and the statistical investi- 
gation was initiated in a series of lectures by Professor J. Neyman. 

Consider an experiment which starts with a known number of eggs 
of flour beetles and a known number of pairs of flour beetles of a certain 
species contained in a vial with a definite amount of medium. Beetles 
used at the initiation of the experiment are those just emerged from 
pupae. Neither pupae nor larvae are present at any time in the course 
of the experiment. Experimental conditions are kept constant with 
respect to temperature, relative humidity, conditioning of medium, etc. 
At regular time intervals, observations are made on the number of eggs. 
To avoid hatching, after each observation the eggs are replaced by 
fresh eggs and the same beetles are returned to the vial. The experi- 
ment is carried out through the entire life span of the female beetles. 

Depending on the number of fresh eggs replacing the old ones, the 
experiment may be classified into two types. In the first type of experi- 
ment, the number of fresh eggs replacing the old ones is equal to the 
number of old eggs removed. In this case, the number of eggs observed 
at any time depends on the previous observations. In the second type 
of experiment, the number of fresh eggs is predetermined by the ex- 
perimenter. For example, the number of fresh eggs may be constant. 

The objective of the experiment is to determine, on the basis of the 
observed numbers of eggs, the female beetles’ ability to lay eggs (that is, 


*Presented at the joint meeting of the Institute of Mathematical Statistics and the Biometric 
Society, ENAR, in Chicago, Illinois, April 27, 1956, under the title, “Egg-laying-egg-eating stochastic 


processes of flour beetles’’. 
**Thig investigation was supported (in part) by a research grant (No. G-3666) from the National 


Institutes of Health, Public Health Service. 
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egg fecundity) as a function of their age. The problem is complicated, 
however, by the fact that flour beetles eat their own eggs. The observed 
number of eggs depends not only on fecundity of female beetles but 
also on the cannibalism of both male and female beetles. In order to 
determine egg fecundity of female beetles, it is necessary to estimate 
the loss of eggs owing to cannibalism. It is the purpose of the present 
study to suggest general methods of estimating egg fecundity and egg 
cannibalism of flour beetles from the numbers of eggs observed in the 
course of the experiment. The observed numbers of eggs are treated 
as random variables. In Section 2, general probability functions are 
derived for the random variables associated with each of the two types 
of experiment. For the first type of experiment, the dependence of the 
random variables is discussed in Section 3. Two methods of estimation 
are suggested in Section 4. To illustrate the application of the idea, 
particular functions of egg fecundity and egg cannibalism are con- 
sidered in Section 5. Finally, in Section 6, the problem of identifiability 
of parameters and other related points are discussed. 


2. A GENERAL STOCHASTIC MODEL 


Let z be the initial number of eggs at the initial time f) of an experi- 
ment and let k be the number of pairs of beetles. For every t > % , 
let a random variable Z, be the number of eggs observed at the time ¢. 
Let 


B.(t) At + o(At) = Pr{exactly one egg will be laid during the time 
interval (¢, ¢ + A#), given that there are exactly 
z eggs at time ¢}, and 


5,(t) At + o( At) = Prf{exactly one egg will be eaten during the time 
interval (¢, + AZ), given that there are exactly 
2 eggs at time /}, 


where the symbol o( A?) stands for a quantity which tends to zero faster 
than Af. These two probabilities may be regarded as ‘egg-laying’ 
probability and ‘egg-eating’ probability, respectively. Clearly, 6,(¢) 
is a function of egg fecundity and 6,(¢) is a function of egg cannibalism. 
The values of 6,(¢) and 6,(¢) also depend on the number of beetles, the 
number of eggs present in the vial, and the time of observation (which 
corresponds to the age of the beetles). Both 6,(¢) and 6,(¢) are to be 
estimated on the basis of the observed values of Z, . The purpose 
of this section is to derive the probability function of Z, , that is, 


(1) P,,.s(to , t) = Pr{Z, = z | Zo eggs at to} 
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forz = 0,1, 2,---. The probability P,,.,(t , f) is defined to be zero if 
z is negative. 

To derive (1), we first deduce an infinite system of differential 
equations for the probability functions. It is well-known that the 
system is of the following form.* 


& Prats) = — 1B.) + 8.0 IPanclts 1) + BeaPracealte » 8 


= 62+:(OP.,,241(to ’ t), for z= 0, 1, 2, ee 


In the experimental work considered in this study, beetles are of the 
same species, of the same age at any given time in the course of the 
experiment (since they all just emerged from pupae at initiation of the 
experiment); and the space in a vial and the amount of medium are 
large so that no competition may be assumed among female beetles 
for space to lay eggs. These considerations lead to the assumption 
that ‘egg-laying’ probability is proportional to the number k of female 
beetles present, that is, 


(3) B.(t) = kB(t). 


The ‘egg-eating’ probability may be assumed to be proportional to 
both the number k of pairs of beetles and the number z of eggs present: 


(4) 640) = kzo(i). 
The quantity 6(f) may be regarded as a measure of average fecundity, 
and the quantity 6(f) may be regarded as a measure of average canni- 
balism per pair of beetles. Substituting (3) and (4) in the system (2) 
gives 
5) a Peeello 1) = RIB + 26(0IP to D + ROP re eit» 1 
he LUO Es saallag Dy £00, 2 = 0,11, 2, 2%, 
System (5) describes, then, the growth of the population of beetle’s 
eggs in terms of the assumptions (3) and (4). The solution of this 
system is the desired probability function. 
In order to solve the system (5), we introduce the generating func- 
tion Gz(u, ¢) defined by 
Gz(u, t) = ye tie ve Ate ? t), 
z=0 


since this provides a convenient way of computing all the probabilities 
and the moments from the derivatives of this one function. We deduce 


*For the derivation of System (2), see, for example, Chiang [1]. 
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from (5) the differential equation for the generating function, namely, 
(6) = Galu, t) — kati — w) = Gu, ) = —kpA — WG-2, t). 


When ¢ = f) , 2 = 2 , hence the initial condition of (6) is 
(7) GAG) Be) Ss Ue 


The solution of the partial differential equation (6) corresponding to 
the initial condition (7) is easily found to be 


(8) G,(u, t) = Gx(u, )-Grtu, d, 

where 

(9) Gz(u, ) = (1 — pe + up.)” 

and 

(10) OG Oe 

with 

(11) p: = exp {-k i 6(T) an. 

and 

(12) y= kes {—x faery ar} act) exp {k ae) ax} ar 


The probability P.,,.(¢. , 4) can be obtained from (8) by the following 
relationship: 


5 HOR ee—t OA er 


u=0 


cy so wae 


It is noticed from Equation (8), however, that the random variable 
Z, is the sum of two independent random variables, say X, + Y, , 
whose generating functions are Gx(u, f) and Gy(u, f), respectively. 
Moreover, Gx(u, ¢) is the generating function of a binomial distribution 
in the case of 2 independent and identical ‘trials’ with the probability 
of ‘success’ in a single ‘trial’ p, defined by (11), whereas Gy(u, #) is the 
generating function of a Poisson distribution with parameter \, defined 


by (12). Thus, the probability P,,,.(¢. , #) may also be obtained from 
the equation, 


re Dd, Pr{X = z}-Pr{Y =z — 2}, 
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or, using the well-known formulas of the binomial and Poisson laws, 


Zo ! rab Vat 
i) Poh ,) = 0" (1 — p)?-* 2 
- vlto » 0) x x\(z. — x)! pl — pi) (2 — a)!’ 
for z = 0, 1, 2, --- . Formula (14) is then the desired probability 


function of the random variables Z, 

When observations on the number of eggs are made at regular time 
periods, we may denote by a constant 7 the time interval between any 
two consecutive observations. For convenience, the experiment starts 


at time ¢, = O and observations are made at t = 7, 27, 37, --- , nr. 
The entire range from 0 to nz covers the life span of the average beetle. 
For each z = 1, 2, --- , n, let a random variable Z; denote the number 


of eggs observed at time ¢ = zr. It follows from the preceding results 
[Eqns. (11), (12), and (14)] that, for the first type of experiment, the 
probability function of Z; is given by 


Zo —hiy 21-2 
-=zt$= eee a Nr ee 

= AU 2 wiz, — x)! PAE Pe) (2, — x)!’ 
for 7 = 1, 2, --- , n; with z being the initial number of eggs, 


(16) pi = exp \-k f 6(T’) ar}, for = 1,2, 3 
0 


and 


an = bow \-k il (2) a} ‘| a(2) exp {h Hl 5(T) an dt 


[OI 2 leo a eee 


Here again each of the random variables Z; is the sum of a binomial 
random variable and a Poisson random variable. The expectation and 


variance of Z; are, respectively, 


(18) EZ = eee Ot ly 2) oo 
and 
(19) Bi — ZopiAL — D:) + r; ) for 1 =s es Pay ee a 


In the second type of experiment, where after each observation a 
predetermined number of fresh eggs replace the old ones, the probability 
function of Z; is given by 


Zr i—1 et dis 


pee We 1-2 
a 1 a * Pe al Ae 
=0 riers a x)! pe “( B z (2; ie x)! ; 


(20) Pr{Z; =2} = 
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fori = 1,2, --- , n; where z* is the number of fresh eggs returned to the 
vial after observation made at ¢ = ir, and p* and \¥%* are defined re- 
spectively as follows: 


(21) exp {- | 6(t) at, fore =) 1 Deen a, 
(i-1)r 
and 
N= kb exp {x f 5(t) a A(t) 
(22) (Gr (ease 


t 
eae {i | 6(T) a} di, ior ia eee 
Gs =1)ir 
It may be interesting to note that the difference between p* and 7, , 
* and X,; lies in the lower limits of the integral signs and that they 
have the following relationships. 


(23) Piper Dp? =p, , 10r t= 12, 4-5 1, 
and 

* * * fs 
(24) ee ee TOR ta eee th 


The probability functions (15) and (20) depend on {() and 6(é). 
Since both fecundity and cannibalism of beetles may vary with the age 
of the beetles, attempts are made in Section 5 to discuss particular 
functions of B(/) and 6é(¢). We shall, however, first investigate the 
dependence of the random variables in the case of the first type of 
experiment. 


3. DEPENDENCE OF THE RANDOM VARIABLES 


In the first type of experiment, the number of eggs observed at 
any time in the course of the experiment depends on the preceding 
observations. The dependence may be investigated through the 
generating function of the joint distribution of all the random variables, 
Z,,2Z,, °°: ,Z,, and it may be expressed in terms of the covariance 
between Z, and Z; , or in terms of the correlation coefficient pz,z, , for 
Dies USO RE Ra? th Ps 700 » N. 

The generating function of the joint distribution of the random 
variables, Z, , Z. , --- , Z, , is defined by 


(25) Ga mestz thy 9 2 OS sae) = Elud*uz* Ee ee ib 


An explicit expression for this function can be derived by using the 
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identity 
(26) Etus*) = E[EWZ* | Z,-2)], 


and the generating function of the conditional distribution of the random 
variable Z; given Z;_, 


(27) G@z,12;_,(u;) — E(uz' Zigas)s 
It is easily seen from formulas (8), (9) and (10) that 
(28) Gaia, ee) = ep) te 


with p* and A* given by (21) and (22), respectively. Using the identity 
(26) for ¢ = n, we write 


Oy eees te 9 ea gg lhe) = Bl a” ° eth BG,” | Z,-3)), 


and, owing to (28), 


Ga ries 7 (Ue 5 tag Uh) 
(29) = Efug*uz? a Oh Yohei (| _— De — Upc sien ts 
— Eluy*uz* wee Tie ag  eiaraal (ahaa 


where 


Yaa = Upi(l — p*% + u,p*t) = up ee ; 
Pn-1 Pn-1 


or 


Pn 
Du— a 


In Formula (29) there are only n — 1 random variables, one less than 
we started with. Using Equations (27) and (28) once again, the random 
variable Z,,_,; drops out. By repeated applications of the same procedure, 
all the random variables are removéd from the expression of the gener- 
ating function, and finally the following formula is reached. 


i Uyet (1 an Un—1) == ele) — Un) + 


Gz, Z0i*+, Sn pigs » Un) 


30 : 
( ) os (1 - (1 2 v,)p,)" exp [- > (1 = ore |, 


where 


1-vy,= Dy pills si 90 u,;-1(1 =; )'. 
j=s Di 
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The right member of (30) contains the product u;u; whatever bet + 3; 
i,j = 1,2,---,mn. This implies that any pair of random variables 
Z, and Z; are statistically dependent. 

Formula (30) may be used to compute either the joint probability 
of the random variables, or the joint moments /[Z;'Z,” --- Z,"] what- 
ever be non-negative integers r. Easy calculation gives the expression 
for the covariance, 


(31) O72; — zop;(1 —));) te oN for Ly; tyJ = 12; POs 9 ho 


When « = j, Formula (31) reduces to Formula (19), the variance of 
Z, . It follows from (31) that the correlation coefficient between Z; 
and Z; is given by 

Pees zopi(1 — pi) + pids/Di 
(32) ie V [eop.s(1 — ps) + As][2ops(1 — pi) + Aj] 


for t= ee ea) P28 Me 


which is always positive whatever bez andj. It can easily be shown that 
(i) the correlation coefficient pz,z, 1s an increasing function of the initial 
number of eggs 2) ; and (ii) for a fixed 2, the correlation coefficient 
decreases as the interval (i7, jr) increases. Thus, a large number of 
eggs observed at any given time tends to increase the number of eggs 
to be observed at a later time; however, the longer the time interval 
between two observations, the less will be the effect of the outcome of 
the first observation on the second. 

Remark: For simplicity of presentation, the preceding discussion 
dealt with a single experiment. In actual work, however, replicates 
are performed of the same experiment. Let m denote the number of 
replicates. For convenience, each replicate starts at time 4 = 0. 
In the a-th replicate, for a = 1, 2, --- , m, let zo, be the initial number 
of eggs and let Z;, be the number of eggs observed at time t = ir. Let 


"1< : 
moe Ute 7 10rd 2 eee 


CS 2 Soe ah ee 
m a=1 


be the corresponding averages over the m replicates, and let 


(34) S3.= LG ca — Z)(Zia — Z;), for i,j = 1, 2, Ne 


It is found that for the first type of experiment, the expected values, 
variances and covariances of the averages are given by 


(35) E(Z;) = Zopi a rj ; for 1 — Le 25 eee Fup 
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and 


aa he os Di 
(36) OF 32; = m Ex Di) aa D; | 


for iS, J alge, 


Obviously, the S;; upon division by m are consistent estimates of the 
covariances (36). 

In the case of the second type of experiment, the expectations and 
variances of the averages are 


(37) BZ) = Ze pt +r¥, for 7 = 1,2,+-* ,n, 
and 
(38) oh, = > ttl — pt) + MI, for 4= 1,2, -- yn, 


at. = = Dod 245 for t= 1,2, 5% 5n, 


are the average number of fresh eggs replacing the old ones after observa- 
tion is made at time tf = (7 — 1)r. 


4. METHODS OF ESTIMATION 


In both the dependent case and the independent case, the probability 
functions (15) and (20) involve a number of parameters. These param- 
eters are contained in the functions @(¢) and 6(¢). Various hypotheses 
may be made regarding the egg fecundity and egg cannibalism on the 
basis of biological theory. In order that the general model presented 
in Section 2 be applicable to experimental work, it is essential to have 
a method of estimating the parameters in the probability functions. 
Because of the complexity of the probability functions, usual methods 
of estimation lead to formulas which are so intricate that it is difficult, 
if not impossible, to obtain explicit solutions. Methods suggested here 
are based on a recent investigation [2]. To save space, we shall be 
concerned mainly with the dependence case. Simple modification of 
the resulting formulas will lead to the solution for the independent case. 

Let Z; be the average number of eggs over m replicates observed at 
time ¢ = ir, and let S;; be the sample covariances as defined in Equations 
(33) and (34), respectively. Let S’’ be the elements of the inverse of 
the covariance matrix || S;; ||, 2,7 = 1, 2,---,m. According to Equa- 
tions (35), (16) and (17), the agen tne BG. ) are functions of B(d) 
and 6(¢) and thus contain all the parameters involved in a model. Let 
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the parameters be denoted by 6 = (6, , 6, °-- , 9.), and let 
(89) BZ) = "G6, 4 Oy op OP), lots = ee 


with ¢; denoting the functions. The object of the methods of estimation 
is to minimize a quadratic form 


(40) Q= 2d 2, [Z; — ¢(ONZ; — £;(]S" 

to obtain estimates of the parameters. It is shown in [2] that as m, the 
number of replicates, tends to infinity, the estimates derived by mini- 
mizing the quadratic form Q are consistent, asymptotically normal and 
asymptotically efficient. The estimates are called regular best asymptot- 
ically normal estimates (? BAN estimates). 

Minimization of the quadratic form Q may take place either with 
respect to the parameters 6, or with respect to the expectations ¢. We 
shall discuss them separately. 

(A) Minimization of Q with respect to the 6. When expectations are 
linear functions of the parameters, minimization of Q may be made with 
respect to the @’s and the problem is the one of least-squares estimates 
in the Gauss-Markoff Theorem [3]. When expectations are not linear 
functions of the parameters, the Gauss-Markoff Theorem is not directly 
applicable. In such cases, however, we may first linearize the functions 
¢,(0) at a selected point, say 6 = (6, , 6., --- , 6,), and then substitute 
the linearized functions of ¢; for ¢; in (40), and finally, minimize the 
resulting quadratic form with respect to the parameters 6’s to find the 
estimates. The procedure is outlined as follows. 

First, select s sample averages Z,, , Z.,, --: , Za, and set 


(41) Lge Ca NO hs 5 a ea ys LOT =k te peer 


solve Equations (41) for the parameters and let the solutions be denoted 
by 6 = (6,, 4, --: , @,), which may be called pilot estimates of the 
parameters. Secondly, write 


54(0) ma (8) = 2d bin Oy = 6), for t= 12. BO On 


as linear approximations to the expectations ¢,(6). Here ¢,(6) = 


rer Cp 8, ee 6,) are the values of the expectations taken at the pilot 
point (0,90; 5 ---, @,). and 
06 ;(0; 05 7 pied a] ) a —— il 2 See nN 
bs = ) ? d 8 ) ? ? 
k 38, re for : 


biel ee eae S: 
are the values of the derivatives of the expectations with respect to 
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6, taken at the same point (6,, 6, --- , 6,). Thirdly, substitute ¢*(6) 
for ¢;(6) in (40) to obtain a eae fae of Q; say Q*, 


(42) ahiee De 3 [Z; — ¢*(O][Z; — ¢4(@)]S". 


Finally, minimize Q* with respect to the 6’s to find the estimates. 
The estimates are given by 


(43) 6, = Cm: >: (GES ‘> +S b8°Z* : for k = tees SO 8 
h=1 i=1 j=1 


Here C,,, is the cofactor of c¢,, in the determinant 
C= lez |, | Ee oes 8 Dp ar Sea 


with elements 
sn = 2 3 baS’b;,, for hk =182, -.5,s 
and a 
Zt = Z; — ¢(8) + by Bipegie He: (Gaeenl De ieee 71; 
In the independent case, we minimize the quantity 


n 8 2 
(44) fe (2: = yy bil) Se 
t=1 h=1 
instead of (42). This is an ordinary least-squares problem and the 
estimates can be obtained in the usual way. 

(B) Minimization of Q with respect to the ¢. The estimates obtained 
by the preceding method depend on the choice of the sample averages 
Z..,Za,,°** , Za, , although in certain cases the final result does not 
vary much from using one set of sample averages to another (Section 6). 
A second method is suggested here when n, the number of observations, 
is small; this method does not require selection of sample averages. 

In this method, the quantity Q in Equation (40) is minimized, with 


respect to the expectations, to obtain estimates [, , 2, +--+ , §, and 
then the equations 

(45) = Owes go Os), for io 1, 2, <5, My 

are solved for the estimates 6, , 6., --- , 6,. The n expectations ¢; , 


however, are functions of s parameters and thus they are not all inde- 
pendent. Their dependence may be expressed in terms of a system of 
nm — 8 = r equations, say, 


(46) EGG Sar °° 9 Se), = 0; a FP AN 


90 BIOMETRICS, MARCH 1957 


These equations may be regarded as side restrictions, and the expecta- 
tions ¢; must be subject to these side restrictions before estimation 
takes place. Equations (46) are deduced by eliminating the parameters 
from the Equations 


(39) C= CO) On oa) for ees ipa Oro te 27 e 


The estimates obtained by minimizing the quadratic form Q with respect 
to the ¢’s subject to the restrictions (46) are identically equal to those 
found by minimizing Q with respect to the parameters 6 directly. 

In some cases, Equations (46) are linear in the ¢. Then they may 
be used in the next step in the process of estimation. Ordinarily, how- 
ever, Equations (46) are not linear. In these cases we replace Equations 
(46) by their ‘reduced’ form. Denote by F(Z) the functions 


Fit, fo, 2°* , §,) evaluated at the point (7,52. 15 Zao Una 
EAL) = FAM Dou ele fore = 2 eat 
Denote by d,, the derivatives of F..(¢; , 2 , --- , &n) with respect to ¢; 
evaluated at the same point (Z, , Z., --- , Z,), 
; OF AG Sat Ge 
‘a Of; faz) 


Then we expand the functions F’, at the point (Z, , Z., --- , Z,), take 
the first two terms of the Taylor expansion and set them equal to zero 
to obtain the ‘reduced’ form of the side restrictions 


(47) Fs, Z) = FZ) + BW: — Z,)d,; = 0, for wu = 1,2, --- ,r. 


Minimization of the quadratic form (40) is then made with respect to 
¢; subject to the ‘reduced’ form (47) instead of to the original side 
restrictions (46). The resulting estimates of the expectations are given 
by 


T ty 


(48) Gig IE OSs DS, Pals for nas Oe ps 
a 


7=1 u=1 v= 
where P,,, is the cofactor of the element p,, in the determinant 
a Dal hy Do Leo meena, 
with elements 


Puv = SS duiSisdy; ) U, Vv, => iL 2 eile! fe) lis 
=1 


a7=1 j] 


Estimates of the parameters are obtained from the relationships (45). 
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5. PARTICULAR FUNCTIONS OF EGG FECUNDITY AND EGG CANNI- 
BALISM 


To illustrate the application of the preceding methods, let us consider 
particular functions of egg fecundity B(¢) and egg cannibalism 4(é). 
It is known in experimental work that the female beetle lays eggs 
throughout most of her lifetime, although her egg fecundity varies 
with her age. Immediately after her emergence from a larva, a female 
beetle’s ability to produce eggs is low; her egg fecundity increases as 
she grows older. After she has reached her most fertile days, her 
reproductivity declines, and she lays very few eggs during the closing 
days of her life. In line with this reasoning the following function A(é) 
is suggested, 


(49) B() = (a + bbe. 


Since A(t) corresponds to a probability, the constants a, b, and c are all 
non-negative. According to Equation (49), at initiation of the experi- 
ment, 8(0) = a. As time increases, @(¢) increases and it reaches its 
maximum of (b/c) exp {—(1 — ac/b)} att = (b — ac)/bc. Thereafter, 
the function 6(¢) decreases and approaches zero as ¢ approaches infinity. 
Since a beetle lives a limited period of time, according to this function 
there is a positive probability that a female beetle will lay eggs at the 
closing moment of her life. 

For simplicity, we assume constant egg cannibalism. ‘That is, 
6(t) = 6. The parameters involved in this model are then a, b, c, and 6. 

To estimate the parameters by the methods described in Section 4, 
we need to deduce expressions for ¢; , the expectations of the average 


number of eggs observed at time ¢ = i7, forz = 1, 2,--:,mn. For the 
first type of experiment, the expectation is given by 
(85) C; = E(Z;) = Zp: +A; ; for = 12, ae Nn, 


with p; and ; defined, respectively, by (16) and (17). Under the 
assumption (49) respecting egg fecundity and the assumption of constant 
cannibalism, we have 


Da OyactOrata= 1, 2 o* 1, 
and 
r, = (6; — 06)0, + 1610,, for i= 1,2,--: ,n, 
where 
uate (org kbr ka - kb 
pea oe pits parr, and Ge ars © PRE 
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It follows that the expectations ¢; are given by 
(50) & = 26, + (6: — 6:)6, + 7616,, for += 1,2,--- ,n. 


As was pointed out in the preceding section, the estimation of the 
parameters may be made by either of two methods. The first calls for 
linearization of the expectations ¢; about a selected parameter point, 
whereas the second relies on the derivation of side restrictions to be 
imposed on the expectations. In the experimental work considered, 
n, the number of observations is large, and thus it would be laborious 
to use the second method to estimate the parameters involved. We 
shall therefore concern ourselves mainly with the first method and shall 
only briefly describe how the side restrictions may be deduced for the 
second. 

For the first method of estimation, we first select a constant 7 and 
a constant h and consider the time of observation ¢ = 77, (¢ + h)r,--:, 
(¢ + 4h)r. The average numbers of eggs observed at these time periods 
are used to find the pilot estimates of the parameters. It is easily seen 
from (50) that 


— O08 s+0n = (81 — 60) 617°", 

+ [+ @ + Dh}e: — @ + gh) eile, 7 e, | 
for g = 0,1, 2, and 3. The first three equations of (51) imply 
(52) (£561 — Qos4n6t +H Si42n) 60 — (Ceen1” — WisonOt + Sevan) = 0, 
and the last three equations of (51) imply 


(51) Geer sey 


(53)> Cyn Gi — 25a tan ie Caran) On Cuan = Gee 6 eee en 


Solving each of Equations (52) and (53) for 6) and equating the two 
expressions, we have 


(54) Wait + Wai6 + Wed” + Widi + Wo: = 0, 
with 

War = CiSrsan = Seas 

Was = 2Sesnksson — O5Ee000), 
(55) Was = SiSieran + WiarSsrsn — B5ia2, » 

Was = 2WCssaSicen — Siarbevn), and 

Wor = SisanSivan — Sisan - 


STOCHASTIC PROCESSES 93 


With the corresponding observed averages replacing the expectations, 
a pilot estimate of 6, , say 6, , can be obtained from (54). Substitution 
of 6, = 4, in Equation (52) gives 6) . The quantities 6, and 6, are 
then used in Equations (51) to compute 6, and 6,. Once pilot estimates 
are obtained, we can follow the procedure described in Section 4 and 
linearize the expectations ¢; about the point (6) , 6, , 6. , 63). The 
estimates 6) , 6, , 6. , and 6, are then obtained from Equation (43). 
Calculation of the estimates of a, b, c and 6 can be made easily from 
Gas Oe, 0, and 65: 

In using the second method of estimation, we need to deduce the 
side restrictions (46) imposed on the expectations ¢; by eliminating the 
four parameters from (50). The first part of the procedure of elimination 
is similar to that described for the first method. We may start with 
Equation (50) and follow the procedure described there until Equation 
(54) is reached. Now we let h = 1 and have the corresponding formula 


>; = W20t — Waid} “fe Waid si Maser eh ery 0, 


for? = 0,1, 2,--- ,n — 4,-where the W’s are defined in (55) with h = 1. 
The side restrictions can be obtained by requiring that the same value 
of 6, satisfies two equations, say ®; = 0 and $,,, = 0. To achieve this, 
we write 


(56) 67, = 0, and 67%;,,=0, for g = 0,1,2,3, 


fort = 0,1,2,---,n— 5. For each 7, (56) represents a system of eight 
equations and they are, so to speak, generated from the two equations 
6, = Oand 4;,,; = 0 by multiplying each of the two equations through 
by unity, 6, , 0; and 6} , respectively. For example, for g = 3, 


6:®; = Wi; + W301 ata Wi; e Wid x2 Woidi = 0. 


We may regard 6, , 0: , --- , 6; as seven unknowns. In order that, for 
a fixed 7, the system (56) of eight equations be satisfied by the same 
value of 6, , it is necessary and sufficient that the determinant of the 
coefficients W be equal to zero. Since the W are functions of the 
expectations ¢; , the n — 4 determinantal equations represent n — 4 
required side restrictions imposed on the expectations. 


6. IDENTIFIABILITY OF PARAMETERS AND DEPENDENCE ON 
SAMPLE AVERAGES 


It has been noted that the first method of estimation depends on the 
choice of sample averages to obtain pilot estimates. One should then 
investigate the degree of dependence of the final results on the sample 
averages, especially when m, the number of replicates, is moderate. 
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The investigation can best be made by fitting a theoretical model to a 
set of empirical data. Dr. Thomas Park of the University of Chicago 
kindly made available to us the data from a study of Dr. C. L. Kollros [4]. 
Because of the way the experiment was carried out, our investigation 
led to an important point. This is the identifiability of parameters 
involved in a model. 

In Kollros’ study the experimental unit was a vial containing k = 1 
pair of Triboliwm castaneum. The beetles were less than 24 hours old 
at initiation of the vials. Egg counts were made at intervals of 7 = 3 
days until each vialed female died. After each count the eggs were 
removed from the experiment and the same pair of beetles were returned 
to the vial containing fresh medium. Since the eggs were removed after 
each count, Kollros’ study belongs to our second type of experiment. 
Our first attempt was to fit the model suggested in Section 5 to the data. 
However, because of the removal of eggs, the observed number of eggs 
is a result of both egg-laying and egg-eating processes in the correspond- 
ing period and such information cannot be used to identify separately 
egg fecundity and egg cannibalism. This may be shown as follows, 

The distribution of a random variable is determined by its generating 
function. For this type of experiment, the generating function of the 
random variable Z; , the observed number of eggs at time ¢ = 17, is 
given by [Eqs. (8) and (28)], 


Gz,(u;) = exp {—(1 — u,)A¥}, 


which depends only on A# . The quantity \* , according to (22), has 
the expression 


aT 


Mak | BO exp {-k / 5(T) an dt. 
4—1) 7 t 

It may be argued that the integrand is essentially a single function, and 

that it is only a matter of opinion as to which part of the integrand 

stands for egg fecundity and which part for egg cannibalism. In fact, 

any two sets of functions, say, 6:(t) and 6,(t), and 6,(é) and 6,(¢), satisfy- 

ing the relationship 


Bild) 
B2(t) 


will lead to the same \¥ . Since the generating function of Z; is com- 
pletely determined by \* , according to the concept of identifiability [5], 
the functions B(¢) and 6(¢) and the parameters involved are not identifi- 


able. Consequently, egg fecundity and egg cannibalism cannot both 
be estimated with Kollros’ data. 


= expk f {a(0) — a(T)} a7, 
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In order to use Dr. Kollros’ material to investigate the dependence of 
final results on the selection of sample averages, we modify the assump- 
tions on egg fecundity and egg cannibalism to the following: 


(57) B(t) = (a + be) exp {—ct} and 4() = 0. 


Assumptions (57) should not be interpreted to mean that egg canni- 
balism is zero. Rather the function 8(¢) should be taken to stand for 
the difference between egg fecundity and egg cannibalism, or ‘net’ 
fecundity. 

The formula for \¥ corresponding to these assumptions (57) is found 
to be 


(58) A* = (i — 1)6;-*@, — 10/0. + 6;-'0; — 0/6, , fori = 1,2, --- ,n, 
with 


=o ka. kb 
6, =e ; f=» and eS Saree 


The quantity \¥ is of course also the expectation of Z,; , the average 
number of eggs observed at time ¢ = 77 over all the replicates. That is, 
A* = ¢; . To use the first method of estimation, we need to find pilot 
estimates of the parameters. Equations (58) are linear in 6, and 6; for 
a given 6, . Therefore, the main job is to use sample averages to esti- 
mate 6, . This can be done easily by observing that, whatever be 
non-negative 7 and h, with 7 + 2h S n, (58) implies 


(59) Cjs2n QF ;40 ae Oe = 0. 


Using proper sample averages in place of ¢; , ¢;,, , and [;,2, , we have 
pilot estimate 6, from (59). The quantity 6, is then used to obtain the 
corresponding pilot estimates of 6, and 6, from any two of the three 
equations of (58) forz = 7,7 + h,j + 2h. The remaining part of the 
procedure of estimation follows the steps described in Section 5. 
Figure 1 shows the results of fitting the model characterized by (57) 
to the experimental data of Dr. Kollros. The scatter diagram stands 
for the average number of eggs over m = 42 replicates observed in the 
course of the experiment. A total of n = 50 averages were computed 
and the corresponding points were plotted. To investigate the depend- 
ence of the final results on the choice of sample averages, two sets of 
sample averages were used. The first set is composed of Z, , Z, , and 
Z,; , and the second set is formed by Z, , Z:1: , and Z,, . The estimates 
of the expectations ¢; were computed from (58) for each of the two 
cases and they are represented by the two curves, respectively, in 


Figure 1. 
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Sample averages used 


Number of eggs 


Age of female (in days) 


FIGURE 1. 


Ace Speciric ‘Nev’ Fecunpity or Tribolium castanewm OBSERVED BY C. L. KoLuros. 


The significance of Figure 1 is that although the two sets of sample 
averages are quite different and the scatter diagram shows a large varia- 
tion, the resulting curves are remarkably close to each other. It is true 
that, judging from either one of the two curves, the stochastic model 
characterized by the assumptions (57) does not fit the empirical data 
well. This very fact of ‘‘bad fit’’, however, makes the closeness of the 
two curves even more noteworthy. For if the observed data had shown 
a definite pattern and if the pattern were well represented by the model, 
one would anticipate a good fit of the model to the data regardless of the 
choice of sample averages, and the closeness of the two curves would 
not be surprising. To put it differently, if the model had satisfactorily 
described the underlying pattern of the egg-laying-egg-eating processes, 
one would anticipate that the resulting curves would be even closer to 
one another. This result thus leads us to conclude that although the 
first method of estimation depends on the selection of sample averages 
to obtain pilot estimates, the degree of dependence is negligible. 

As a final remark, we wish to point out that in this paper the methods 
of estimation are discussed in general terms and that they can be applied 
to other problems. The problem of estimating egg fecundity and egg 
cannibalism cannot be solved, however, until we have the right kind of 
experimental data on the one hand and an acceptable stochastic model 
on the other. Kollros’ data are not suitable for this purpose and, in 
the light of Figure 1, the particular model suggested in Section 5 probably 
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is not an acceptable one. Further work in both experiment and analysis 
still needs to be done. Also the ‘‘net”’ fecundity, as shown in Figure 1, 
appears to have more than one mode. If the multimodal pattern is 
real, it would be extremely important and interesting to have a biological 
explanation for its existence. 
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THE USE OF AFFINITY IN CHROMOSOME MAPPING* 


Marcaret E. WALLACE 


Department of Genetics, University of Cambridge, England 


1. INTRODUCTION 


The subject of chromosome mapping is becoming of increasing 
importance to the geneticist. In the past thirty years, and particularly 
in the last ten, a very large number of genes have been discovered in 
various organisms; and as new linkages are found for them, it becomes 
necessary to construct an adequate theory describing their inter- 
relationships. Such a theory must supply the means of specifying the 
frequencies of the modes of gamete formation of an animal showing 
simultaneous segregation of genetic markers at any number of linked 
loci,—these frequencies to be specified in terms of the positions of the 
genetic markers on the chromosome in relation to each other and to 
the centromere and termini. 

The only observable quantity is the amount of recombination ex- 
hibited by the sets of genes in such an animal—one would obtain the 
data most simply by backcrossing a multiple heterozygote to the 
multiple recessive. The recombination value, y, depends on the number 
of chiasmata occurring during the meiotic divisions of gamete-formation. 
A chiasma is a configuration of the four homologous strands observable 
at the pachytene stage of meiosis and occurs when the members of 
these strands twist together and break at various points, the broken 
ends joining up with different partners. These points may thus be 
regarded as points of exchange of chromosomal material, resulting in 
recombinations of the genes contained in them. So, y can be defined 
for two loci as the probability of an odd number of exchange points 
between them. Thus y cannot be used as a measure of distance (i.e. 
as a metric), but it is possible to construct a suitable metric based on 
the number of exchange points. Such a metric has been proposed: 
viz. that x, the average number of exchange points per strand between 
two loci, over a large number of meioses, is a measure of the distance 


*Context of a paper given at the Annual Meeting of the British Region of the Biometric Society 
on December 12th, 1955. 
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between them and is additive for several loci. If it is assumed that the 
exchange points occur at random along the strand, then the relation 
between the y values for different segments is quite simple: put forward 
first by Trow [1913] for two segments, it can be written Yi2=YWty- 
2Y1Y2 , Where y, , y2 and y,.,. are the observed recombination values over 
the first segment, the second segment and the interval combining the two 
segments, respectively. The relation of y to x under this assumption 
is given by Haldane’s [1919] formula 2y = (1 — e7’). 

However, the assumption of independence in the distribution 
of exchange points is rather an artificial one. For some cytologists 
explain the behavior of the chromosomes at meiosis on a mechanical 
basis; and it seems reasonable, on this basis, to suppose that the occur- 
rence of a chiasma at one point relieves tension along the strand, and 
that consequently the next chiasma will occur some distance away 
from it: that, in other words, there is interference between chiasmata. 
Whatever the explanation may be, in fact both cytological and genetical 
evidence support the idea of interference, and show that it is usually 
positive, i.e. that the occurrence of a break at one exchange point 
tends to inhibit (rather than to promote) the occurrence of others 
near it. This complicates the relation of y to x, and the relation between 
y values in the same chromosome; Trow’s and Haldane’s formulae 
have been rejected, and it is not yet clear exactly what should replace 
them. 

The only point at which these formulae might have been expected 
to hold is the centromere. Both cytological and genetic evidence 
favours the view that there is no interference across the centromere— 
that is, that the centromere acts as a sort of buffer between the two 
arms. Thus at this point the relation y,,. = Yi + Y2 — 2y:Ye is ade- 
quate. However, the kind of data obtained by linkage experiments 
with higher organisms does not supply a direct way of determining 
the position of the centromere. One can only locate it by deriving the 
interference relations from such data and by placing it in the position 
most in accordance with the theoretical implications of these inter- 
ference relations. Thus the centromere-position is not available as a 
premise on which an interpretation of linkage data in terms of inter- 
ference may be made. It is in this context that affinity data are of 
particular use, for they do supply direct information on the position of 
the centromere; and thus affinity reinstates Trow’s and Haldane’s 
formulae. 

Finally, both genetical and cytological evidence is that interference 
is low for segments near the centromere and that it increases for seg- 
ments at successively greater distances from it. It is not at present 
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clear whether the terminus of the arm affects the intensity of inter- 
ference in segments near it, but it may well have a small effect. Thus 
there is need for an interference metric by which the degree of inter- 
ference may be specified for any two or more segments at all positions 
in the chromosome, and which takes account of the effect both of the 
centromere and of the terminus of the arms. 

The development of such metrics has been mainly due to Sir Ronald 
Fisher and A. R. G. Owen. In 1947 Fisher, Lyon and Owen proposed 
that the frequency distribution of lengths of intercepts between suc- 
cessive breaks in single strands is of the form 


sech (47u) tanh (42u) d($zu) 


where u is the metrical value. Owen [1949, 1950] found that the fre- 
quency distribution of 4x is very similar to that given by Fisher, and 
is computationally more manageable. Owen also found [1948] that it 
fits very well all the available Drosophila data. What are now needed 
are data from well-designed and comprehensive tests—that is, concerning 
the simultaneous segregation of markers at three, four or even five 
loci,—in other organisms, so that this, and other metrics, may be 
tested against them. We are at Cambridge endeavouring to supply 
such data for the only mammal on whose chromosomes a reasonable 
number of markers is known, namely the house-mouse. 


2. THE DISCOVERY OF AFFINITY 


It was while I was working on a linkage experiment of this kind, 
that I found evidence for the new phenomenon—which has been called 
affinity—having a direct bearing on the mapping of chromosomes, 
and particularly on the mapping of the centromere. 


TABLE 1 
Quast-LinkacE or W with CHromosome V MARKERS 


We—rae W—ft W — Sd 
WA VE tae US oh a ae an Vie A Se ES ae 
at a at a ap ae oP Sd- + Sd + 
C.b.cs 42 41 27 25 185) 129 65 102°108 404 | 180 152 138 171 641 
R.b.cs 64 72 53 61 250} 28 26 35 19 108| 15 384 29 927 4105 
N;R ; 192 : 193 298 ; 214 414 : 332 
x71 31 0.0026 13.7813 9.01 
for 1 d.f. eats 
R-f., % 50.1299 41.7969 44.5040 


Key: C.b.c. and R.b.c. stand for Coupling and Repulsion backcrosses respectively. 
N = non-recombinants, R = recombinants. 
R.f. = recombination fraction. 
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In 1950 I noticed that the marker W (dominant pied), of linkage 
group III, was showing linkage-like associations with the three markers 
a‘ (tan belly), fz (fidget) and Sd (Danforth’s short tail), all in linkage 
group V. On a comprehensive compilation of the data in 1951, I 
observed the quasi-linkages shown in Table I which I felt were too 
significant to be ignored and too inexplicable in terms of ordinary 
linkage to be accepted as such. 

The association of W with the linkage group V markers is inexplicable 
in terms of ordinary linkage because the five observed recombination 
values have a non-linear relationship, so that W cannot be placed any- 
where in linkage group V without calling upon extravagant notions 
about interference (Figure 1). 


Linkage group I: W 


Linkage 
group WZ: at fi Sd 
FIGURE 1 
Association or W witH a*, fi, anv Sd 


There seemed no alternative to the conclusion that linkage groups 
III and V refer to different chromosomes, but that some point, at 
least in V, is responsible for the association with W. This point was 
thought to be situated nearest to fi, since the W-fz linkage value is 
the smallest. 

At this juncture, Professor Sir Ronald Fisher received privately 
from Dr. Donald Michie, of Oxford, an account of a theory which, if 
experimentally confirmed, would explain the linkage-like association 
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of four independent markers observed by Gates [1926] in backcross 
progeny of a cross between Mus musculus and Mus bactrianus. Michie’s 
hypothesis was that, at the first meiotic division in the hybrid, musculus 
centromeres tend to go to one pole and bacirianus centromeres to the 
other, and that this tendency results from an attraction of centromeres 
of like origin either for each other or for some polar element of the cell. 
Sir Ronald suggested to me that the association I had observed might 
have the same cause. In this case, the point in linkage group V re- 
sponsible for the W-V quasi-linkages would be the centromere, and 
the observed quasi-linkages would be composite, being due partly to 
the linkage of the chromosome III and V markers with their centromeres, 
and partly to an attraction between these centromeres (Figure 2). 


W Ci 
0 . 
a ee eee 7 Composite 
ss oe ek emai el recombination 
| values<——> 
Ll 
Sim ay Ok SA Oe IE tees 10 
pe eee Se a eee FS e end 4 Siegal. a 
at fi G Sd 
FIGURE 2 


{;, Tue Rewation or W wits at, fi anp Sd on THE Basis or “AFFINITY” 


Since the centromeres, in the generality of organisms, appear to be 
the primary movers in the passage of the chromosomes from the equator 
to the poles of the dividing cell, it seemed reasonable to suppose that 
in both phenomena, that of Gates observed in crosses between species 
and my own observed in crosses within the species, the agents were in 
fact the centromeres. 

Michie found further evidence of quasi-linkages in crosses between 
species in the work of Little [1927] and of Green [1931], which was, 
on the whole, corroborative; but the theory required the proof of direct 
experimentation. Meanwhile I found, in my own mouse records, new 
quasi-linkages from crosses within laboratory stocks; and I set up further 
tests. ‘These were immediately fruitful in the production of a very sig- 
nificant quasi-linkage between Sd and Ca (caracul) in linkage group VI. 
At Sir Ronald Fisher’s suggestion, the term “affinity” was adopted by 
both Michie and myself [1953]. A preliminary report on other such 
quasi-linkages has been given (Wallace [1954]), and a full report will 
be given shortly. 
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I also embarked on a thorough investigation into what may be called 
the W-V association. This was intended to demonstrate four kinds of 
quasi-linkage. These are the four kinds possible on a model of affinity 
which assumes that there are only two kinds of centromeres and that like 
ones attract. Two of these kinds of quasi-linkage show recombination 
exceeding 50%, itself a feature distinguishing affinity from chromosomal 
linkage. It was also intended—and this was the easiest, and actually 
an unavoidable part of the programme—to demonstrate independence of 
the two linkage groups. A phenomenon which produces, under specifi- 
able conditions and with predictable frequencies, segregations of both 
kinds, linkage-like and independence, cannot be interpreted as chromo- 
somal linkage. The programme was thus intended to produce features 
distinct from linkage—and also, it was hoped,—distinct from any other 
known phenomenon. 

The four kinds of quasi-linkage are expected from the four types of 
double heterozygote listed in Table 2. 


TABLE 2 
Tue Tyres or HeTeROcENTRIC HETEROZYGOTE WITH Two SincLty MARKED 
CENTROMERES 
Apparent 
Type Chromosomes Phase of Abbrvn. segregation 
Vs Markers Centromeres| for het. in offspring 
; Aa Ba ; ; 
(i) —= == Coupling Convergent CC < 50% Coupling 
aB bB 
(ii) ae we Repulsion Convergent RC < 50% Repulsion 
ag BB 
(iii) oe ae Coupling Divergent CD > 50% Coupling 
agp ba 
(iv) we iy Repulsion Divergent RD > 50% Repulsion 
aB Ba 


Key: Symbols for markers are in Roman letters. 
Symbols for centromeres are in Greek letters. 
A convention is used here by analogy with that often used for chromosomal linkage, of writ- 


ing centromeres from the same parents side by side, and of writing homologous chromo- 
somes one above the other. 


In this table, only one marker on each of the non-homologous 
chromosomes is shown. Actually I kept both fz and Sd segregating, 
and, whenever possible, a‘, so that the data from any animals which 
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did show quasi-linkage, could be used for mapping purposes. Despite 
a bad outbreak of mouse catarrh, I obtained well balanced data for all 
the quasi-linkages W-f7, W-Sd and W-a'; and the data from my three- 
point linkage backcross supplied accurate observations for the three 
chromosomal linkages a‘-fi, fi-Sd, and a‘-Sd. 

Full details of the data from both the three-point backcross and 
the W-V investigation will be published shortly (Wallace [1957]): a 
preliminary report of the latter has already been given (Wallace [1954]). 
A critical analysis disposes of all alternative hypotheses which might 
explain the data, and strongly supports the hypothesis of affinity. More- 
over, there is good evidence that five of the tested heterozygotes were 
“coupling convergents” and that two were “coupling divergents’’. 


3. THE USE OF AFFINITY IN MAPPING 
The data from these seven ‘“‘affinity”’ animals consist of 716 progeny 
from males and 90 from females (Table 3). 


TABLE 3 


® 
A SUMMARY OF THE SIMULTANEOUS SEGREGATIONS OF VW, fi, AND Sd FROM THE SEVEN 
“APFINITY” HmTHROZYGOTES 


Origin of data Mode of gamete formation and observed frequency 
1. Male 

heterozygotes 
Total data (P) (W) (Sd) (fi) Total 
ignoring a‘ locus 341 236 74 65 716 
Total data (P) (a!) (W) (Wat) (Sd) (Sdat) (ft) (fiat) Total 
including a‘ locus 71 84 1382 # 55 48 13 5 37 545 
2. Female 

heterozygotes 
Total data (P) (W) (Sd) (fi) Total 
ignoring a‘ locus 46 23 10 11 90 
Total data (P) (at) (W) (Wat) (Sd) (Sdat) (fi) (fiat) Total 
including a* locus 13 10 8 5 a 1 2 5 48 


Key: The heading (P) denotes the mode of gamete formation which produces wholly non-recombi- 
nant chromosomes. The headings (W), (Sd), or with any other mutant symbol, each denote 
that mode of gamete formation which exchanges, in the multiply-heterozygous parent, the 
mutant indicated with its normal allele. The figure beneath each heading denotes the 


observed number of progeny produced by that mode. ‘This notation was first d 
Wright [1947]. soe, 


These data can be used in mapping as follows: Trow’s formula can be 
written in the alternative form: 


(1 — 2y,42) = (1 — 2y:)(1 — 2y2) (1) 
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In the W-V association we have a conceptually linear system of six 
points: 


W c! 
chromosome IIL 


chromosome WZ 
a fi C Sd 


If, as I have indicated as a reasonable supposition, the “point of attrac- 
tion” is the centromere, and there is no interference across it, three 
equations of the form of (1) are available: 


(1 — 2W-Sd) = (1 — 2W-C’)(1 — 2C’-C)(1 — 2S8d-C) (2) 

(1 — 2W-fit) = (1 — 2W-C’)(1 — 2C’-C)1 — 2fi-C) (3) 

(1 — 2W-a') = (1 — 2W-C’)(1 — 2C’-C)\(1 — 2a'-C) (4) 
where W-Sd = Yw_sz, W-C’ = Yw-c: , etc., i.e. the symbol y has been 
omitted for brevity. The terms on the left are derived directly from 
the three observable quasi-linkage values. Each concerns a particular 
V marker and is proportional to a corresponding term on the right 
concerning the unknown chromosomal linkage value between the V 
marker and the centromere. This being so, the actual value of these 
unknowns can be found with the further aid of the observed chromosomal 
values between the V markers. 
a) Ignoring Relations With a’ 


The relation for chromosome V corresponding to (1) above is 
(1 — 2fi-Sd) = (1 — 2f7-C)(1 — 28d-C) (5) 
From equations (2), (3) and (5), the following identities are available: 
(1 — 2fi-C)’=(1 — 2W-fi)(1 — 2fi-Sd)/(1 — 2W-Sd) (6) 
(1 — 2Sd-C)’=(1 — 2W-Sd)(1 — 2fi-Sd)/(1 — 2W-fz) (7) 
(1 — 2W-C’)?(1 — 2C-C’)’=(1 — 2W-fi)(1 — 2W-Sd)/(1 — 2ft-Sd) (8) 


Thus it can be seen that the recombination fractions fi-C and 
Sd-C can be obtained without knowledge of the separate values W-C" 
and C’-C; and also that the value of the compound W-C’-C is calculable. 
Although the ingredients of this compound cannot be determined, an 
upper limit, namely the whole value of the compound, may be set to 
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the attraction-value C’-C. Further, if C’-C should vary from mating 
to mating, this fact does not disturb the values fi-C and Sd-C’ which 
are estimated without using it. Hence all matings showing quasi- 
linkage can be pooled for the estimation of centromere position in V, 
even though C’-C may be different for each of them. 

For the present, quasi-linkage and linkage relations with a’ are 
ignored for mapping purposes, except as a check upon the position of 
centromere V as calculated without it. This is because the observed 
quasi-linkages with a‘ are so near 50%, that enormous error would be 
involved in any calculations using them. But for the case where this 
is not so, it is worth noting below the appropriate mapping procedure. 


b) Including Relations With a’ 


Two further equations of the form of (5) are available, making six 
equations in all for the estimation of four unknown parameters. There 
are thus two degrees of freedom available for testing goodness of fit. 
A maximum likelihood method of estimation of all recombination 
values can therefore be used. 


Mapping With Two Doubly Marked Chromosomes 


It may be of interest to note here the appropriate relations for the 
case where both chromosomes have at least two markers. Material 
involving several chromosomes is being built up for this purpose: one 
stock, with W, fi and Sd, includes /x (luxate) as the second chromosome 
III marker. With these four points, four equations of the form (2) 
and (8) are available, and these provide the relation 


(Us s2ler ie a We) 
(1 — 2lx-Sd) (1 — 2W-Sd) (9) 


This relation is a crucial test of affinity, because it holds only if the 
“centromeres” are fixed points and if there is no interference across 
them; therefore if it does hold, this is evidence that the points involved 
are the centromeres. 

These four equations, together with a further equation of the form 
of (5) now available with lz, and with equation (5) itself, provide the 
relation 


; 1 — 2la-fi)(1 — 2le-Sd)\(1 — 2W-fi(1 — 2W-S 

1 — 2¢/-¢)' = = 2h fl — 2le-Sd (1 — 2W-fi(1 — 2W-Sd) 

( ) (1 — 2le-W) — 2fi-Sd)? (10) 
This determines C’-C whose value, used in appropriate equations of 


the form of (8), allows the centromeres in both chromosomes to be 
mapped in relation to their markers. Alternatively, equations (6) and 
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(7), and two further equations of this form now available, may be used. 
4. THE MAP OF CHROMOSOME V 


Returning to the affinity data available from the W-V investigation: 
using the data from the seven affinity animals for the quasi-linkages 
W-fct and W-Sd, and substituting them and the chromosomal value 
fi-Sd in the appropriate formulae, we obtain: 


ae f CG Sd 
arin 13.26% 9.92% 


fi CG Sd 
bo 10.74% I6.94% 


The relations with a’ when used only as checks, are quite satisfactory 
in confirming the order fi-C-Sd as given above. 

It should perhaps be pointed out that with this locus ignored, there 
are only three observable variables from which three parameters may 
be estimated. Thus there are no degrees of freedom for measuring 
goodness of fit. However, it is possible, by means of a x’ test, to obtain 
fiducial limits for the location of the centromere. This x” may be 
designated x”A for reference, and is obtained by the following argument. 
From equations (6) and (7) 


1 — 2Sd-C _ 1 — 2W-Sd 


C= hy -f ef 
The ratio 
(EOE el 
1 — 2fi-C we) 


gives a linear relation between the frequencies of the four modes of 
gamete formation observable in the affinity data. Thus if a, b, ¢, d 
correspond with the observed frequencies 


(P) (W) (Sd) (ft) 
341, 286, 74, 65 


respectively (given in Table 3), then 
1—2W-Sd a—b—c+d_ 96 


en) 


1—-2W-fi a—b+e-—d 114 


108 BIOMETRICS, MARCH 1957 


The relation 


a—b-c+d 


ee eer nT: 


may equally be written 
A—Da-A—D)b+At+ )e—-At+)d =), (13) 


implying in this form that a linear function of the frequencies observable 
should be zero for each value of \ in terms of which the coefficients of 
the function are expressed. For any arbitrarily chosen value of ) the 
observed frequencies will show some discrepancy, and the significance 
of this may be easily determined since the sampling variance of any 
linear function of observable, mutually exclusive, frequencies is always 
known (Fisher [1954], p. 307). 

In general, the sampling variance of any such linear function of 
observable frequencies, the expected value of which is zero, is expressible 
in terms of the expectations, so that 


V3) = (eG) a) Geo) 


where a, 8, y, 6 stand for the values of a, b, c, d expected. 
Hence 


{(a-DA-D+e-ADA+ DD)’ 
A-—Wate~—+aA4+) 74+ 8h 


is distributed as x° for 1 degree of freedom. In this expression the 
marginal values (a + 6b) and (c + d) may be substituted for the expecta- 
tions (a + 8) and (y + 6). With x’ values corresponding to chosen 
significance levels, the limits of \ tolerated by the affinity data may be 
found. Using relations (12) and (5), the limits of the values Sd-C and 
fi-C may then be determined. Thus, when a x’ value of 3.841 (for the 
5% level) is used, the limiting values of \ for the male data are 0.47 
and 1.34. The value of \ when the centromere is assumed to be exactly 
at the fz locus, is found by using equation (12); zero is substituted for 
the recombination fraction f7-C, and the value for fi-Sd as found from 
Table 3 is substituted for the recombination fraction Sd-C. This is 
0.62. Since the lower limiting value 0.47 is less than this, it may be 
concluded that the affinity data tolerate, at the 5% level, a position 
of the centromere beyond fi. However, they do not tolerate a position 
beyond Sd, for the limiting position for the 5% level (when \ = 1.34) 
is as follows: 


(14) 


ji——16, 16%, C4817, cq) 
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From the affinity data, tentative interference maps have been 
constructed. The positions of the supposed centromere for males and 
females given by the affinity data may be used to obtain a map in 
metrical units based on Owen’s model of the 1x? interference metric. 
This is done by taking trial metrical values, and verifying (by a method, 
using segmental functions, outlined in the Introduction to the tables 
of Fisher and Yates [1953]) that they coincide, within the limits imposed 
by the x°A, with the f7-C and a‘-fi values already derived. The ob- 
served fi-Sd value (from the three-point linkage backcross) is then 
used to obtain a trial Sd-C value, and this again must coincide well 
with that given by the affinity data. To see whether these trial values 
conform with the observed chromosomal linkage values, a x” test (which 
may be designated B) is performed. This tests the goodness of fit of 
the frequencies of the modes of gamete formation for the segments 
a'-fi-Sd expected on the basis of the trial values, with the observed 
frequencies obtained in the three-point linkage experiment. On the 
male data a xB equal to 4.5009 for 1 d.f. has been obtained; and on 
the female data, a x’B equal to 1.4857 for 1 df. 

Clearly, a map equally consistent, and in statistical agreement, 
with the three-point backcross data and the affinity data can be obtained 
on this system of mapping, by the minimisation of the summed x? 
(types A and B). The W-V affinity data, although the first of their kind, 
are not extensvie enough to warrent further calculation and adjustment 
beyond the remark that the centromere position most consistent with 
both bodies of data is probably somewhat nearer Sd than is indicated 
by the following provisional map (Figure 3). 


Arm 
toh (oJ bi at fi C S? length 
32.658 36.188 12.828 10.381 Recomb. values. 
34.576 42.057 14.210 < 11.636 90.843 Map distance 
22 42 31 95. Metrical distance 
Arm 
oie Il! at fi C S? length 
31°743 -2i.210) 15.052) «125865 Recomb. values. 
83.598 29.753 16.159 < 14.872 79.510 Map distance. 
22 29 34 85 Metrical distance 


FIGURE 3 
A Trntative Map or CoRoMosomEe V 
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I have not considered all the evidence for the supposition that the 
“point of attraction” indicated by the affinity data is in fact the centro- 
mere. There is some other qualitative evidence which will be given 
in the full analysis of the W-V investigation, meanwhile the maps given in 
Figure 3 provide quantitative agreement with this hypothesis. How- 
ever, two conclusory remarks, which may appear paradoxical, should 
perhaps be made. The W-V investigation cannot be said to prove 
conclusively that the “point of attraction” is in fact the centromere. 
On the other hand, the reasonably good agreement with this hypothesis 
obtained from the mapping procedure, is somewhat surprising, for two 
rather stringent assumptions have been made. First, it is assumed 
that there is no interference across the centromere, whereas it cannot 
be said that the available evidence excludes it completely; and second, 
the xi metric has been assumed to reflect absolutely the operation of 
interference from a‘ to Sd in chromosome V, whereas it is on its -agree- 
ment with Drosophila data that the expectation of a reasonably good 
reflection in Mouse data is based. 
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THE HERITABLE VARIANCE OF F, X F, PROGENY MEANS 


J. H. VAN DER VEEN 


Laboratory of Genetics, Agricultural University, Wageningen, Holland 


In discussing continuous variation in quantitative inheritance 
Hayman and Mather [1955] used eight parameters to specify the 
phenotypes when two pairs of genes are involved. Four of these cor- 
respond to main effects and were adopted by Fisher et al. [1932] and 
by Mather [1949]; d, and d, represent respectively the phenotypic 
difference between AA and aa and that between BB and bb, whilst 
h, and h, stand for the departure of the heterozygote from the midpoint 
of the homozygotes at locus A — a and locus B — b respectively. The 
four newly introduced parameters 7,:; , Jaje » Joja and l;,, correspond 
to interactions and represent respectively the interaction of d, and d, , 
that of d, and h, , that of d, and h, and that of h, and h,. The system 
of symbols can also be used for interactions between more than two 
loci, €.g. jaj,- for the hom-het-het component of variation. Hayman 
and Mather [1955] expressed the second degree statistics, derived from 
families originating from a cross between two true breeding lines, in 
terms of the eight parameters, assuming no linkage between the two 
loci and no differential elimination of gametes or zygotes. For the 
covariance of F.-parents and biparental progeny means they found 
the simple expression 4d; + }$d; + g7;,, and thought it to be of special 
value as it ‘‘--- provides in a sense a direct measure of the fixable 
heritable variance - - 

Now the heritable variance of means of progenies obtained by 
backcrossing F’,-individuals to F; , proves to give rise to an expression 
that also includes only terms in d’ and 7’. For 7 unlinked loci one 


obtains: 


poe 


(2")° (2') 


ea ene je 7 (tom 


iia 


2 1» hs 
ys tone beet + + Gatien to bo + Upper 
——— 
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In the case of two loci this becomes 3d? + 4d; + @rt2,; , which is alge- 
braically independent of Hayman and Mather’s covariance equation 
quoted above and so can be helpful in evaluating the hom-hom inter- 
action. 

As the parameters for digenic interaction are expected to be generally 
larger than those for trigenic interaction one may, in view of the rela- 
tively small coefficients of the terms for trigenic and higher interactions, 
use as an approximation in the case of 7 loci: 


iP eines 
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CORRECTION TO ABSTRACT 370, A. JANOSCHEK (Giessen). 
Quantum Biology and Reaction Kinetics. 


The formula for the law of reaction kinetics stated in this abstract 
(this journal, Volume 12, 1956, p. 227) was incorrectly reproduced. 
It should be written . 


N, = No{l — exp [—(k-D)?-#]}™. 


CORRECTION TO ABSTRACT 392, CHARLES W. DUNNETT 
(Lederle Laboratories, American Cyanamid Company, Pearl River, 
New York). Multiple Decision Procedure. 


In copying this abstract, a reference was omitted. The references 
should have read: 


Bechhofer (Ann. Math. Stat. 25: 16-39), Paulson (Ann. Math. Stat., 
20: 95-98) and Somerville (Biometrika, 41: 420-429). 


QUERIES AND NOTES 


Grorce W. Snepecor, Editor 


QUERY: Interpretation of x? Tests 


In a therapeutic trial, 31 subjects were each given two 
124 opportunities to express a preference for one of two analgesic 
drugs. The results were as follows:— 


Preference on No. of 
First occasion Second occasion subjects 
A A 18 
A B 
B 4 7 
B B 5 


If each subject has a probability p of preferring drug B on any one 
occasion, p is estimated as (2 X 5 + 8)/62 = 0.29, and the expectations 
from the binomial distribution (¢ + p)’ are 15.6, 12.8 and 2.6, giving 
ax’ with 1 df. of 4.33, P = 0.04. 

There is a marked deficiency of ‘‘mixed”’ responses, suggesting that 
p may vary from one subject to another. Following R. A. Fisher, 
Statistical Methods for Research Workers, section 19, we may test for 
this specific point by comparing the observed variance with that ex- 
pected. This presumably more sensitive test gives x? = 42.59 with 30 
d.f., P = 0.06, a less significant result. Can you clear up this apparent 
paradox? 


In each of the tests of significance used, the test criterion 
ANSWER: only approximately follows the standard x’ distribution 

whose significance levels are tabulated. The exact treatment 
of similar problems in connection with the Poisson distribution has 
been discussed by Fisher (Biometrics, 6, 17, 1950) and Rao and Chak- 
ravarti (Biometrics, 12, 264, 1956). The present example is particularly 
simple as there is only one way in which the observed figures can depart 
from expectation, so that both tests are different approximations to 
the same thing. If 31 subjects give a total of 18 preferences for drug 
B, there are only ten possible sets of observed frequencies, as shown in 
the table. The relative frequencies of these observed sets are inde- 
pendent of the unknown parameter p; if a , a; , @, are the numbers of 
subjects expressing 0, 1 or 2 preferences for drug B, these relative 


frequencies are given by 


31! 18! 44! a 
62! Bol dy! aa! 
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TABLE 1 

Observed 

frequencies Goodness of fit test Variance test Exact test 
IX ANB 18) 

or Xa) P xW P. Xigo) © |Brea: 2 

AD BAY B 
(Dy 32), @) 1 @) (5) (6) (7) (8) (9) (10) (11) 
13 18 0 5.19 .023 —2.28 .989 18.32 .953 .0292 1.0000 
14 16 1 1.98 .166 —1.41 .921 23.17 ~.808 .1598 .9708 
15 14 2 0.29 .591 —0.54 .705 28.03 .569 .3196 .8110 
IG if 3 0.11 .742 +0.33 .629 32.88 .328 .3029 .4914 
ly Ai 4 1246 2227 =-1.21> 2113 BMiotee  sllsirl .1470 = .1885 
18 8 6 4.88 087 +2.08 .018 42.59 064 .0868 0415 
19 6 6 8.72 .003 -+2.95 .002 47.45 .022 .0045 .0047 
20 4 i || ees — +~=«+3.82 — 52.30 .007 .0002 .0002 
Pil 2 8 | -22.05 — —3 4.70 — Dielom 2002 — 
22 0 o) pono) —'—s- ++5.57 62.01 .001 — 


These frequencies are given in column 10 of the table. From the cum- 
ulated values in column 11 we see that the exact probability of obtaining 
a sample as extreme as the one observed when sampling from a binomial 
distribution is 0.0415. 

The results of the two y* tests applied to the different possible 
samples are also shown in the table. It will be seen that the ordinary 
“‘goodness-of-fit”’ test classes together samples of the type observed 
and samples with too few ‘‘mixed” judgments, thus arranging the samples 
in a different order. To render this test comparable with the others, 
we may take the square root of x”, give it the sign of the discrepancy in 
the “‘consistent’’ classes, and interpret it as a unit normal deviate. This 
provides the results in columns 6 and 7 of the table. 

Neither of the x” tests approximate very closely to the exact figures, 
and the reason is not far to seek. In the goodness-of-fit test, the smallest 
expectation is 2.6, while in the variance test the mean is 1.42. Both these 
figures are small enough to upset the assumption on which the theory 
of the x” tests is based. A further factor lies in the heavy grouping. 

An alternative approach would be available if the numbers in the 
AB and BA classes were given separately. The data could now be 
set out in the form of a 2 X 2 table, and the usual test of significance in 
effect would test whether patients who prefer one drug on the first 
occasion tend to prefer the same drug on the second occasion. Such a 
test would not be affected by any shift in the general level of preference 
from one occasion to the other. With the small numbers involved, the 
exact form of the test would have to be used. 
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If we assume heterogeneity to be present (as is reasonable on general 
grounds apart from the results of the test of significance) p will differ from 
patient to patient. Over a population of patients it will then have a 
probability distribution. The mean of this distribution is estimated as 
0.29, and its variance can be estimated as .0865 (Robertson, Ann. Hugen., 
Lond., 16, 1, 1951). If we assume further that p follows a Beta dis- 
tribution, it is possible to show from the chart in Pearson and Hartley’s 
Tables that about 22% of the population have p > }, i.e. show an 
average preference for drug B. 

P. ARMITAGE 


London School of Hygiene 


M. J. R. Hearty 
Rothamsted Experimental Station 


125 NOTE: MISSING PLOT ESTIMATES 


H. Farrrietp SMITH 
North Carolina State College 


During the past four years there has appeared almost annually in 
Biometrics a comment on the meaning and error variance of a missing 
plot estimate (Snedecor and Williams [1952, 1953]; Nelder [1954]; 
Norton [1955]). Some discussion has centered around whether it should 
be considered ‘‘merely a number to be put in the empty space” to 
facilitate derivation of estimates of treatment effects and of the error 
variance, or may be considered as an estimate of the observation which 
was lost. None of the contributors seems to have noticed that the error 
variance of the estimate depends on what it is intended to estimate. 
This must be decided first and then the other aspects fall easily into 
place. 

Assume the usual model for a randomized block experiment with 
customary restrictions and assumptions 


Ug = 8 Te a Py As 47 (1) 
4=1--:m, gale. 
Let the missing plot be numbered 7 = 1, 7 = 1. In common notation 


G = the sum of yields on all (mn — 1) observed plots, 7’ and F# are the 
sums of (n — 1) and (m — 1) observed yields in treatment 1 and block 
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1 respectively, and the estimate of the missing yield is 


po, ME ne eG : 
Ym — Din — YD) ®) 


It is well known that substituting %,, for y,, and proceeding as if data 
were complete leads to the least squares estimates of the estimands in 
(1) and to the estimate of error variance with (m — 1)(n — 1) — 1 
degrees of freedom. If the estimates of the estimands be denoted by 
corresponding roman letters it is furthermore well known that 


Qn = 4+ Ok 1 (3) 


The error variance of #,, may be evaluated in three ways: 

(i) Equation (2) is a linear function of (mn — 1) observations. The 
error variance follows by the standard rule for the variance of a linear 
function. 

(ii) One may write out the least squares normal equations, invert 
the information matrix, and thence obtain from (3) 
var (91,) = var (a) + var (¢,) + var (r:) 

+ 2[cov (at,) + cov (ari) + cov (é7;)]. 

Alternatively, the formulae with which we are here concerned may 
be somewhat more easily found by writing the model with (m + n — 1) 
independent estimands, thus 


Yes = Oni 1 Fe 0p 7 6G 
7 = pi = 0. 
One then finds a,, = 9%, and its variance is indicated by the single 
leading element of the inverse matrix. 
(iii) One may use the analysis of covariance on a dummy variable. 
We shall see later that this leads to a different variance for the estimate. 


Procedures (i) and (ii) each lead to the formula given by Nelder 
with a typographical error corrected by Norton, 


(m+n — 1)o’ 


var ($i) = Cee san (4) 


Procedure (ii) demonstrates that this is the variance of the linear func- 
tion of estimates, (@ + #, + 1.) regarded as an estimate of (a + 7, + Dy); 
in other words it is H(a + t, — r, — a — 7, — p,) assuming that the 
data conform to model (1) so that H(§.;) = a +7, + p. 

But if y, is to be considered as an estimate of an actual yield y,, 
which is supposed to have once existed and been lost, one must recog- 
nize that y,, was not equal to its expectation. According to the model 
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it was postulated to deviate from (a + 7; + p;) by a random deviation 
with variance o” and independent of the random deviations of the other 


(mn — 1) observations. The variance of the difference (9;, — y;;) is 
therefore 


2 map n= 1 3 mno 

EQYu — Yu)” = (m — Din oz 1) ainee c= one SS eat (5) 

Since y,; is postulated to be a random variable one can raise the 
philosophical question as to whether a particular value of a random 
variable, as distinct from the parameters of its distribution, may be 
statistically estimated. However, equation (5) still remains true as a 
statement of the dispersion of discrepancies between 9,, and y,; . 

Some of the discussion in the literature cited was concerned with 
the permissibility of an estimate 9,, which was not a possible value for 
a real observation, the example being a case where all observations 
must be positive and the estimate for the missing value was negative. 
If all y;; > 0 then necessarily their expectations are positive, and the 
question really relates to compatibility of the estimate with 
(a + 71 + pi) > O irrespective of what the missing observation may 
have been. The variance (4) is therefore relevant and Norton’s dis- 
cussion is essentially correct. The test of significance for deviation 
from a theoretical value would be correct if the contrast were designated 
a priori as a particular one to be tested. However one may question 
whether a missing value determines such a choice. More correct pro- 
cedure might be to test the compound hypothesis that the (m + n + 1) 
estimates a, ¢; and r; are jointly compatible with the postulate that 
all mn linear combinations of a + 7; + p; should be positive. If yy, 
was not missing but was rejected because it was suspected to contain 
a gross error and one wants to test the evidence from the data them- 
selves that it may be incompatible with other observations the relevant 
test is (G11 — Ys) against variance (5). 

Consider now the method of covariance on a dummy variable, 


24, = —1, 2;; = 0 when 7j + 11. If the analysis of variance of y was 
made with y;, = 0 then the regression coefficient of y on 2, as estimated 
from the remainder row of the analysis of covariance, is b = Gy . 


Furthermore the remainder sum of squares of z is (m — 1)(n — 1)/mn, 
whence, by the usual rule for variance of a regression coefficient, 


var (6) = var (9) = as = variance (5). 


The reason for this is that the analysis of covariance regards y;, = 0 
as an observation with error variance the same as any other observation. 
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If it may seem peculiar to ascribe an error variance to the fictitious 
11 = O suppose that plot 11 was not missing but was damaged, say by 
disease. The observed value is then clearly on the same footing as 
other observations and if it be used in the analysis the regression 
coefficient is b (G1 — Yu). It is furthermore the solution which would 
be obtained for an extra estimand inserted in the model to represent 
effect of disease (Smith [1950]). 

Yates [1933] suggested that if an observation is suspected of being 
incorrect this may be examined by testing the significance of the reduc- 
tion of the error mean square when %,, replaces y;, in the analysis of 
variance of y. This reduction is the square due to regression in the 
covariance procedure. It readily follows from the foregoing that his 
proposal is equivalent to testing (4, — yi)” against variance (5). 


SUMMARY 


Discussion which has taken place about interpretation of an “esti- 
mate of a missing yield’”’ seems to have had its roots in putting the cart 
before the horse by asking what it estimates. The estimate may have 
either one of two standard errors. Which one is relevant depends on 
what one intends to estimate. That intention must be laid down as a 
prior postulate. The discussion is then easily resolved. 
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THE BIOMETRIC SOCIETY 


INTERNATIONAL 


International Biometric Seminar and Symposium 


The Biometric Seminar and Symposium held in Linz 24 September— 
3 October, 1956, were attended by more than 150 participants from 
eleven countries. 

During the Seminar, introductory lectures on statistical methods 
were given by Dr. Pfanzagel (Wien), Dr. Adam (Linz), Dr. Knédel 
(Wien) and Professor Weber (Jena). Practical classes were taken by 
Dr. Knédel and Mr. Voak (Linz). Dr. Knowles (Stockport) and 
Professor Linder (Geneva) lectured on experimental design, with 
practical classes under Dr. Wette (Heidelberg) and Mr. Abt (Zirich), 
and Professor Kellerer (Miinchen) on sample surveys. In addition to 
these formal lectures, a series of case-histories of statistical investi- 
gations under the title “Praxis der Planungsforschung”’ was presented 
by a number of speakers. 

Following the Seminar, a two-day Symposium took place. The 
following papers were read and discussed: Dr. Linser (Linz), and others, 
Wachstume—und Ertragsgesetze; Dr. Herdan (Bristol) and Dr. Augus- 
berger (Niirnberg), Beurteilung der Wirkung von Heilmitteln bei 
chronischen Erkrankungen; Dr. Behrens (Hanover) and Dr. LeRoy 
(Zurich), Die Giltigkeit des ¢-Testes. 

Organizers were Professor Linder and Dr. Adam. Assistance was 
received from the Austrian Ministry of Education, the “Land” Ober- 
oesterreich, the municipality of Linz and the Oesterreichische Stickst- 
offswerke A. G. The proceedings will be published in a forthcoming 
issue of the Statistische Vierteljahreshefte of the Statistical Institute, 
University of Vienna. 

The organizers of the meetings were successful in obtaining wide 
publicity of them, and considerable interest in biometric affairs was 
aroused. It is hoped that an Austrian Region of the Society may be 
formed in the near future. 


REGIONAL 
E.N.A.R. 

At the ninth Annual Meeting of the Region, the following were 
elected: Regional President—B. Harshbarger; Secretary-Treasurer—A. 
M. Dutton; Committee Members—O. Kempthorne and D. Mainland. 

Meetings are planned for Washington, D. C. during March (jointly 


ints, 
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with the IMS) and for Atlantic City, N. J. during September (jointly 
with the ASA). 


Australasian Region 


On one evening of the meetings of the Australian Mathematical 
Society (August 15-18) a general meeting of the Australasian Region 
of the Biometric Society was held. Dr. E. J. Williams was in the 
chair. Dr. E. A. Cornish gave a presidential address ‘“The correlation 
of monthly rainfall’. No business was conducted because of the large 
number of non-members present. 


Région Belge 


Au cours de son Assemblée Générale du 19 décembre 1956, la Société 
Adolphe Quetelet a désigné son nouveau Conseil d’Administration pour 
Vannée 1957: Président—M. R. Laurent; Vices Présidents—MM. 
Lecrenier, De Nayer, Welsch, Lebrun et Reuse; Secrétaire—M. L. 
Martin; Secrétaire adjointe—Melle A. Lenger; Trésorier—M. A. Rotti; 
Membres—MM. Roussel, Bontemps, Roggen. 

Aprés l’Assemblée Générale, M. H. Roggen a donné une conférence 
sur le sujet suivant: Analyse et Application des Séries Temporelles. 


British Region 


The following papers were read at the meeting held on December 18, 
1956: M. J. R. Healy and J. R. Tanner, The coding of human pedigree 
data; B. Woolf, The combination of information from a series of binary 
trials (2 X 2 tables); W. H. Potts and H. R. Simpson, The effect of 
sterilized males on a natural tse-tse fly population. - 

At the Annual General Meeting held on January 24th, 1957, the 
following were elected: Regional President—D. J. Finney; Secretary— 
C. C. Spicer; Treasurer—A. R. G. Owen; Committee members—D. H. 
Chitty, S. C, Pearce, P. A. Young. Following the business meeting 
the following papers were presented: R. E. Blackith, Discriminating 
topology of locusts and grasshoppers; J. G. Skellam, Estimation of a 


red locust population; P. Armitage and E. §. Snell, A sequential clinical 
trial. 


Région Frangaise 


A une réunion tenue le 21 décembre, 1956, M. Jean Porte a présenté 


une communication entitulée—Quelques problémes posés par l’analyse 
factorielle. 
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India 
The Biometric Society was represented by Sir Ronald Fisher at 


the 25th Anniversary celebrations of the Indian Statistical Institute, 
Calcutta. 


Netherlands 


The three biometric clubs met at Utrecht on May 18th, 1956, when 
the following papers were read: C. A. C. Nass, Non-parametric tests 
applied to problems of absenteeism in two firms; Ph. van Elteren, The 
sequential binomial test; Chr. L. Riimke, Sequential methods in pharma- 
cology. 

Meetings were held on February 23rd in Utrecht and on October 
24th in Wageningen in collaboration with the Studiekring voor Statis- 
tical Technique, with papers by the following: J. T. N. Venekamp, 
Comparison of methods for soil analysis; J. C. A. Zaat, Organoleptic 
tests; M. Keuls, Identification of the best partners in corn-breeding; 
F. E. Essed, Tukey’s gap-test. 

After a long lapse, the medical biological section of the Statistical 
Society met on December 11th in Utrecht, with the following pro- 
gramme: M. de Vries, Statistical thinking and Wilcoxon’s two-sample 
test; D. K. de Jong, Design for decision in clinical experiments; E. G. 
van Proosdij-Hartsema, An experimental design in pharmacology. 


MEMBERS 


The following notifications of changes of address and of location of 
new members were received during January, 1957. 

New Addresses 

Dr. Richard J. Henry, Bio-Science Labs., 2231 S. Carmelina Ave., 
Los Angeles 64, Calif., U.S.A. 

Mr. Timothy Prout, 3465 Sun Court, Riverside, Cal., Dis a. 

Mr. M. J. Garber, Citrus Experiment Station, University of California, 
Riverside, Cal., U.S.A. 

Mr. Norman J. Abramson, California State Fisheries Lab., Terminal 
Island Station, San Pedro, Cal., U.S.A. 

Mr. Joe Kennedy Adams, 1020 San Mateo Drive, Menlo Park, Cal., 
U.S.A. 

Dr. Daniel W. Cassard, Animal Husbandry Dept., University of 
Nevada, Reno, Nevada, U.S.A. 

Dr. Frances N. Clark, 947 21 Street, San Pedro, Cal., U.S.A. 

Mr. N. T. J. Bailey, Unit of Biometry, University of Oxford, 6 Keble 
Road, Oxford, England. 

Dr. George E. P. Box, Statistical Techniques Group, James Forrestal 
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Research Centre, Box 710, Princeton University, Princeton, N. J., 
Wash) 


New Members 

E.N.A.R. 

Miss Iris M. Kiem, 9393 8.W. 120th Street, Miami 56, Florida, U.S.A. 

Dr. Marvin A. Kastenbaum, Oak Ridge National Laboratory, P.O. 
Box Y, Oak Ridge, Tennessee, U.S.A. 


Netherlands 


D. K. DeJongh, Jacob van Ruysdaellaan 25, Heemstede, Netherlands. 
D. C. Post, Centrum Voor Landbouwwiskunde Duivendaal 8, 
Wageningen, Netherlands. 


New Regional Secretary 


British—Dr. C. C. Spicer, Central Public Health Lab., Colindale 
Avenue, London, N.W. 9, England. 


NEWS AND ANNOUNCEMENTS 


Members are invited to transmit to their National or Regional Secretary 
(of members at large, to the General Secretary) news of appointments, 
distinctions or retirements and announcements of professional interest. 


Dr. Henry L. Lucas, Professor at the Institute of Statistics, North 
Carolina State College, has been granted a five months leave of absence 
for a study and research tour at Princeton University. 


José Nieto Pascual has accepted an assistantship in the Department 
of Statistics, Iowa State College, and is working toward a Master’s 
degree in Statistics. 


Professor Alan Robertson of the Institute of Animal Genetics, Edin- 
burgh, Scotland, spent five weeks in January and February, 1957, 
as a consultant with the Institute of Statistics and Department of 
Genetics, North Carolina State College, U.S.A. A series of lectures 
was presented describing the work being done by Dr. Robertson at 
Edinburgh and Cambridge. 


Dr. George W. Snedecor of Iowa State College will be at the Institute 
of Statistics, North Carolina State College, U.S.A., to assist with con- 
sulting and analytical work during the absence of Professor J. A. Rigney 
who is at present in Peru. 


Dr. H. R. van der Vaart of Leiden University, Netherlands, will 
spend nine months with the Institute of Statistics, North Carolina 
State College, U.S.A., to continue his research and observe statistical 
methodology in the United States. 


Dr. E. J. Williams of the Commonwealth Industrial and Scientific 
Research Organization, Australia, has joined the staff of the Institute 
of Statistics, North Carolina State College, U.S.A., for two years as 
a Visiting Research Professor. He will serve as chief investigator on 
a research contract sponsored by the Office of Ordnance Research. 


It was wrongly announced in the September issue of this journal 
that Dr. Vincent Schultz was Associate Professor of Biostatistics at the 
University of California. Dr. Schultz is Associate Professor of Agri- 
cultural Statistics, Agricultural Experiment Station, University of 
Maryland, College Park, Maryland, U.S.A. 


NoticE 


The unit at the University of Oxford previously known as “Design 
and Analysis of Scientific Experiments” is now called “Unit of Biometry” 
The address of the unit is still 6 Keble Road, Oxford, England. 
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SumMMER SESSIONS IN STATISTICS 


Southern Regional Graduate Summer Sessions, U.S.A. 


The fourth Southern Regional Graduate Summer Session in Statis- 
tics will be held from 12 June through 20 July, 1957, at the Virginia 
Polytechnic Institute, Blacksburg, Virginia. 


Faculty will include E. J. Williams, Melbourne, D. B. DeLury, Toronto, J. L. McHugh, 
Director Virginia Fisheries Laboratories, and W. O. Ash, L. 8. Brenna, Ralph A. 
Bradley, C. W. Clunies-Ross, John E. Freund, Rudolf J. Freund, Boyd Harshbarger, 
Clyde Y. Kramer, and R. Lowell Wine of the institute. In addition, many foremost 
eastern U.S. statisticians will participate in seminars to be held on four days of each 
week. Courses offered will comprise Sampling of Biological Populations, Analysis 
of Variance from the regression viewpoint with multivariate generalizations, Rank 
Order Statistics, Stochastic Processes with particular reference to engineering appli- 
cations, Probability, Statistical Inference, Theory of Least Squares, Statistical Methods, 
Engineering Statistics and Sampling. 

Summer work in statistics may be applied towards residence requirements for 
graduate degrees at any of the cooperating institutions, and at some others. 


Inquiries should be addressed to Boyd Harshbarger, Head, Depart- 
ment of Statistics, Virginia Polytechnic Institute, Blacksburg, 
Virginia. 


Berkeley, California, U.S.A. 


The 1957 summer program in the Department of Statistics of the 
University of California, Berkeley, California, will consist of two ses- 
sions: June 17 to July 27 and July 29 to September 7. 


The faculty of the summer sessions will include professor D. A. Edwards of Yale 
University, Dr. Walter L. Smith of the University of Cambridge, England, Dr. M. M. 
Brooke, Dr. W. J. Hall, Dr. R. E. Serfling and Dr. M. B. Willis of the Communicable 
Disease Center, Public Health Service, Atlanta, Georgia, Dr. Nathan Mantel of the 
National Institutes of Health, Bethesda, Maryland and Professors Evelyn Fix, J. 
Neyman and R. A. Wijsman of the Department of Statistics of the University of 
California, Berkeley. 

The program includes two undergraduate courses in each session, adapted primarily 
to the needs of students transferring from other centers who would like to undertake 
advanced study at the University of California during the regular academic year. 
A research seminar in statistical problems of health will be conducted, with dis- 
cussions proceeding from medico-biological to statistical considerations. 


Iowa State College, U.S.A. 


The Department of Statistics at Iowa State College will offer six 


applied courses in statistical theory and methods in its two 1957 summer 
sessions. 
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These courses are planned primarily for graduate students or research workers 
with limited mathematical backgrounds who wish to use statistical techniques intel- 
ligently for application to other fields. In addition, a course on special topics in theo- 
retical or applied statistics may be studied at the graduate level. Senior staff members 
will be available during most of the summer for consultations on research or special 
problems. 

Students may register for either or both of the six-week summer sessions: June 
18 to July 24 and July 24 to August 30. Courses offered are: Stat. 401, Statistical 
Methods for Research Workers, (at the level of Snedecor’s Statistical Methods); Stat. 
447, Statistical Theory for Research Workers, (mainly theory of experimental statis- 
tics); Stat. 599, Special Topics; and Stat. 699, Research. In the second session will 
be offered Stat. 402, a continuation of 401; Stat. 448, a continuation of 447; two courses 
in applied methods which are more specialized; Stat. 411, Experimental Designs for 
Research Workers, and Stat. 421, Survey Designs for Research Workers; and finally 
Stat. 599 and 699. 


Inquiries should be addressed to Professor T. A. Bancroft, Depart- 
ment Head and Director, Statistical Laboratory, Iowa State College, 
Ames, Iowa, U.S.A. 


GRADUATE APPOINTMENTS IN Statistics AT Iowa STATE COLLEGE 


A number of appointments are available for students undertaking 
graduate major work in statistics at Iowa State College. These range 
from beginning graduate appointments at $125 to $225 a month for 
half-time teaching, research or consulting duties (additional appoint- 
ments or supplementary earnings may be available for the summer) 
to full-time advanced appointments at $4,500 or more per year. Most 
appointments for the coming academic year will be made about April 
lst. However applications received at a later date will be kept on 
file for consideration as vacancies occur or new positions become avail- 
able. Application blanks and detailed information may be obtained 
by writing to: The Statistical Laboratory, Iowa State College, Ames, 
Towa, U.S.A. 


AMERICAN CANCER Society FELLOwsHIPS (YALE UNIVERSITY 
GRADUATE SCHOOL) IN BIOMETRY 


A number of predoctoral fellowships (three years, stipends $2,000 
per year) and postdoctoral fellowships (one year, renewable, stipends 
begin at $4,000 per year) are available for students in biometry and 
biostatistics. For information write to Professor E. Cuyler Hammond, 
Director of Graduate Studies in Biometry, 30 Hillhouse Avenue, Yale 
University, New Haven, Connecticut. All applications for both types 
of fellowships should be submitted as early as possible in 1957. 


